-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add git ingester #775
Add git ingester #775
Conversation
1ff0075
to
85a06f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments inline. I'm a bit confused about the tracking. Also cloning on every ingest might get expensive fast, should we think about implementing a timeout or LRU?
internal/engine/ingester/git/git.go
Outdated
// Git is the engine for a rule type that uses git data ingest | ||
type Git struct { | ||
accessToken string | ||
fileTrack map[string]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could the ingester ever be called concurrently? (thinking about syncMap)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is a single instance of the ingester being executed at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for anyone else confused like me, we do instantiate a per-rule instance for every rule which in turns instantiates a new evaluator per rule (thanks @JAORMX for the explanation on slack)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more comment about auth - didn't see it earlier, sorry. I guess in order for this PR to not be WIP, we also need to a new Message to mediator.proto
that represents the git ingester along with the plumbing to instantiate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where you're headed with this yet, but the git clone
part looks reasonable.
// Eval calls the jq library to evaluate the rule | ||
func (jqe *Evaluator) Eval(ctx context.Context, pol map[string]any, obj any) error { | ||
func (jqe *Evaluator) Eval(ctx context.Context, pol map[string]any, res *engif.Result) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a type assertion that Evaluator
fulfils engif.Evaluator
?
// Eval calls the jq library to evaluate the rule | |
func (jqe *Evaluator) Eval(ctx context.Context, pol map[string]any, obj any) error { | |
func (jqe *Evaluator) Eval(ctx context.Context, pol map[string]any, res *engif.Result) error { | |
var _ engif.Evaluator = (*Evaluator)(nil) | |
// Eval calls the jq library to evaluate the rule | |
func (jqe *Evaluator) Eval(ctx context.Context, pol map[string]any, res *engif.Result) error { |
// We clone to the memfs go-billy filesystem driver, which doesn't | ||
// allow for direct access to the underlying filesystem. This is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing -- I think you mean "doesn't store any contents on the local filesystem". The fs
object can still be used to read files from the repo.
One future enhancement item might be to be able to further filter the files included in the fetch, so we don't need to allocate hundreds of megabytes for large repos. I think a TODO is sufficient for that case, though.
internal/engine/ingester/git/git.go
Outdated
} | ||
|
||
return &engif.Result{ | ||
Object: struct{}{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Object
be nil
here or r
, rather than an empty struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm not sure I understand the semantics of Result
, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, too, is this supposed to ensure that evaluators that consume an Object will not throw an error, but rather never evaluate? I would say that nil
might be more explicit and safer (thinking about an jq
evaluator that makes sure that an input does NOT have a path, I think this empty object would pass that evaluator..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
42a9691
to
f5b2ff5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits and questions, but overall looks good. Thanks for adding the tests!
// Fs is the filesystem that was created as a result of the ingestion. This | ||
// is normally used by the evaluator to do rule evaluation. The filesystem | ||
// may be a git repo, or a memory filesystem. | ||
Fs billy.Filesystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of evaluator will be plugged into the git ingestor that consumes the FS? Evaluating vulnerabilities comes to mind, but what I wondered when I read this code is if we want to be able to express a contract between an ingestor and evaluator and throw an error on parsing the rule that ingestor X cannot work with evaluator Y rather than throwing a runtime error saying that an object or a FS is missing during evaluation..
// in different calls as opposed to having to set it in the rule type. | ||
type IngesterConfig struct { | ||
Branch string `json:"branch" yaml:"branch" mapstructure:"branch"` | ||
CloneURL string `json:"clone_url" yaml:"clone_url" mapstructure:"clone_url"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the size limit option would come at a later point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right
internal/engine/ingester/git/git.go
Outdated
} | ||
|
||
return &engif.Result{ | ||
Object: struct{}{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, too, is this supposed to ensure that evaluators that consume an Object will not throw an error, but rather never evaluate? I would say that nil
might be more explicit and safer (thinking about an jq
evaluator that makes sure that an input does NOT have a path, I think this empty object would pass that evaluator..)
} | ||
|
||
// If the entity is a repository get it from the entity | ||
// else, get it from the configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't seem to match what the code does (IOW, the code prefers the config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw what is the use-case for git ingester w/o a repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a solid answer right now, but I'm guessing we could do something interesting with comparing contents from artifacts with something in a git repo.
gi := gitengine.NewGitIngester(&pb.GitType{}, "") | ||
got, err := gi.Ingest(context.Background(), &pb.Artifact{}, map[string]any{ | ||
"clone_url": "https://github.com/octocat/Hello-World.git", | ||
"branch": "test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to assert that we cloned the right branch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That branch has other contents, that the master branch doesn't have, maybe it would be sufficient to check that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but it's fine to do it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually let me approve this so that you're not blocked (I'd be curious about the questions I had either way)
d4214cf
to
a3461ca
Compare
The git ingester will take a specific branch, clone it in a temporary directory, and set it as a context for the policy to evaluate on top of it. The intention is to be able to run policies on git contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, there is some follow-up work to do I guess, but this PR can go in as-is
The git ingester will take a specific branch, clone it in a temporary
directory, and set it as a context for the policy to evaluate on top
of it.
The intention is to be able to run policies on git contents.