Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ListEvaluationHistory RPC implementation. #3784

Merged
merged 13 commits into from
Jul 10, 2024
Merged

Conversation

blkt
Copy link
Contributor

@blkt blkt commented Jul 4, 2024

Summary

ListEvaluationHistory RPC returns a paginated list of evaluation events based on the given cursor and filter.

Routines for managing a filter are implemented as well, in a way that should be reusable within other endpoints supporting filters. This implementation only allows and-joined predicates. Such predicates allow only or-joined equality/inequality checks. Simply put, if a filter entry starts with the exclamation mark, it is added to the inequality check, while it is added to the equality check otherwise. Finally, timerange based filtering is supported.

Routines for managing a cursor are implemented as well, but are not intended to be generic. Cursors are tightly coupled with the underlying extraction logic and are harder to refactor, and the additional effort was not considered valuable at this time.

Fixes #3746

Change Type

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

  • unit tests
  • manual tests (locally)

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

@blkt blkt self-assigned this Jul 4, 2024
@blkt blkt requested a review from a team as a code owner July 4, 2024 12:40
proto/minder/v1/minder.proto Outdated Show resolved Hide resolved
@coveralls
Copy link

Coverage Status

coverage: 52.772% (+0.8%) from 52.002%
when pulling 01b6441 on history-service-impl
into ff2be84 on main.

@coveralls
Copy link

Coverage Status

coverage: 52.777% (+0.8%) from 52.002%
when pulling 474f589 on history-service-impl
into ff2be84 on main.

@coveralls
Copy link

Coverage Status

coverage: 52.984% (+0.8%) from 52.213%
when pulling d06da73 on history-service-impl
into 2e004e1 on main.

OR sqlc.narg(tots)::timestamp without time zone IS NULL
OR s.most_recent_evaluation BETWEEN sqlc.narg(fromts) AND sqlc.narg(tots))
ORDER BY s.most_recent_evaluation DESC
LIMIT sqlc.arg(size)::integer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have already been discussed, but have you considered using a SQL generator, like https://github.com/Masterminds/squirrel? I'm worried this raw SQL will become unmaintainable and fragile, especially as we add these types of filters to more endpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started off thinking it was not possible to implement this without an SQL generator, and decided to prove that.
It turned out to be possible instead, so I haven't looked any further.

It is worth noting that the simple filtering logic we decided to implement only allows where conditions of the form

(X = val1 OR X = val2) AND (Y = val3 OR Y = val4)

These conditions are always definable in static SQL, with an additional column1 IS NOT NULL OR at the beginning. Filtering and pagination as discussed in design docs do not allow for more complex filters, and in such cases general search should be implemented instead.

I was not aware of the existence of squirrel, while I originally considered goqu, I'll have a look at both. Besides, the query is fairly efficient and using sqlc we have all the structs and bindings autogenerated, which I don't think is the case using squirrel.

I understand the query generator approach may still be preferable, I'll test squirrel on top of this branch to see what's the impact.

@coveralls
Copy link

Coverage Status

coverage: 52.989% (+0.8%) from 52.204%
when pulling f48bcf2 on history-service-impl
into 6274fe5 on main.

internal/history/service.go Outdated Show resolved Hide resolved
internal/history/models.go Outdated Show resolved Hide resolved
internal/history/models.go Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Jul 8, 2024

Coverage Status

coverage: 53.085% (+1.0%) from 52.131%
when pulling e1c933d on history-service-impl
into 3931da3 on main.

proto/minder/v1/minder.proto Outdated Show resolved Hide resolved
Comment on lines 2562 to 2571
message Entity {
// id is the ID of the entity.
string id = 1;

// rule contains details of the rule which the entity was evaluated against.
EvaluationHistoryRule rule = 2;
// type is the entity type.
minder.v1.Entity type = 2;

// status contains the evaluation status.
EvaluationHistoryStatus status = 3;
// name is the entity name.
string name = 3;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  1. I've rarely been happy in the future with where a nested message has been placed -- I'd lean towards making this a top-level message.
  2. Can you include comments about uniqueness of the various fields, and ability to use those fields in future calls? (For example, I assume that id is globally unique, but is name globally unique, or only unique within the scope of a Provider?)

@eleftherias on provider-scoping entity names, as I know she was looking to remove that linkage (and I was advocating for keeping it).

Comment on lines 2573 to 2582
message Rule {
// name is the name of the rule instance.
string name = 1;

// remediation contains details of the remediation for this evaluation.
EvaluationHistoryRemediation remediation = 5;
}
// type is the name of the rule type.
string type = 2;

message EvaluationHistoryEntity {
// id is the ID of the entity.
string id = 1;
// profile is the name of the profile which contains the rule.
string profile = 3;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels odd to have type fields in both the Entity and Rule messages that mean different things -- Entity.type is an enum (IIRC), while Rule.type is effectively the name of a template. Can we call this rule_type to disambiguate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, the lack of id here means that profile + name is the distinguishing set of fields for future follow-up queries.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evankanderson Are you referring to the rule type ID, or the rule instance ID?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a rule instance ID if you wanted to be able to do a unique follow-up query. (i.e. looking for more history for items which don't have any history or have some suspicious changes over a shorter time window)

Comment on lines 2589 to 2591
// details contains optional details about the evaluation.
// the structure and contents are rule type specific, and are subject to change.
string details = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of having a top-level stringly-typed details that has "rule type specific" contents and structure. That may be where we are at right now, but it seems like this could mean that one rule emits markdown, another plain-text, and a third XML fragments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for line 2604

Comment on lines 2602 to 2604
// details contains optional details about the remediation.
// the structure and contents are remediation specific, and are subject to change.
string details = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on stringly-typed details. Again, I'm willing to accept this as a bridge, but we should get to a more well-defined outcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed during the design review - the proposal was to match the current state of Minder and work on improving remediations/alerts later.

// type is the name of the rule type.
string type = 2;
message Remediation {
// status is one of (success, error, failure, skipped, not available)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like Remediation has one extra status (not available) compared with Status. It feels odd to me that Remediation isn't part of the status and doesn't have a Timestamp -- can we have a Remediation without a Status, or a different time-scale?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we can't have a remediation that happens at a different time to an evaluation (within the margin of a few seconds or so). The engine also tracks the status of the evaluation and the remediation separately.

Comment on lines 2607 to 2615
message Alert {
// status is one of (on, off, error, skipped, not available)
// not using enums to mirror the behaviour of the existing API contracts.
string status = 1;

// details contains optional details about the evaluation.
// the structure and contents are rule type specific, and are subject to change.
string details = 2;
}
// details contains optional details about the alert.
// the structure and contents are alert specific, and are subject to change.
string details = 2;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments on Remediation also apply to Alert. In fact, I'm wondering whether an Alert is a particular type of remediation. @puerco @ethomson

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment in time, they might. I am not so sure that they will remain the same as we evolve them in future.

Comment on lines 2623 to 2630
// status contains the evaluation status.
EvaluationHistory.Status status = 3;

// alert contains details of the alerts for this evaluation.
EvaluationHistory.Alert alert = 4;

// details contains optional details about the alert.
// the structure and contents are alert specific, and are subject to change.
string details = 2;
// remediation contains details of the remediation for this evaluation.
EvaluationHistory.Remediation remediation = 5;
Copy link
Member

@evankanderson evankanderson Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a single status / alert / remediation for each entry? It feels like we'll end up repeating a lot of the envelope (rule, entity) many times when querying history. I'd sort of prefer to see something like this:

message HistoryResult {
  Timestamp start = 1;
  uint32 occurrences = 2;
  string status = 3;  // Include both rule eval and alert / remediate result
  // Temp, while we work out schema
  string status_detail = 4;
  string alert_detail = 5;
  string remediate_detail = 6;
}

message EntityRuleEvaluationHistory {
  EntityInfo entity = 1;
  RuleInfo rule = 2;
  // history is ordered from most recent to oldest start time.  (reverse sort)
  repeated HistoryResult history = 3;
}

message ListEvaluationHistoryResult {
  Pagination pagination = 1;
  // History for one entity will be complete in results before there is history for another entity.  (e.g. paging is by entity, not within an entity's history)
  repeated EntityRuleEvaluationHistory results = 2;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this comment, both ListEvaluationHistoryResult and EntityRuleEvaluationHistory embed a repeated HistoryResult, that's a typo, right?

I don't know how to make progress on this, I'd be happy to discuss if we want to change message structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I meant for ListEvaluationHistoryResult to embed a repeated EntityRuleEvaluationHistory. Updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that we're trying to present something different (a log-history based view of history), so my suggestion doesn't really work.

Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments looking at the SQL query (which I'm very impressed by!).

database/query/eval_history.sql Outdated Show resolved Hide resolved
database/query/eval_history.sql Show resolved Hide resolved
database/query/eval_history.sql Outdated Show resolved Hide resolved
proto/minder/v1/minder.proto Outdated Show resolved Hide resolved
blkt added 9 commits July 10, 2024 19:00
ListEvaluationHistory RPC returns a paginated list of evaluation
events based on the given cursor and filter.

Routines for managing a filter are implemented as well, in a way that
should be reusable within other endpoints supporting filters. This
implementation only allows and-joined predicates. Such predicates
allow only or-joined equality/inequality checks. Simply put, if a
filter entry starts with the exclamation mark, it is added to the
inequality check, while it is added to the equality check
otherwise. Finally, timerange based filtering is supported.

Routines for managing a cursor are implemented as well, but are not
intended to be generic. Cursors are tightly coupled with the
underlying extraction logic and are harder to refactor, and the
additional effort was not considered valuable at this time.

Fixes #3746
This was necessary for local testing because of the following issue.
#3775
@blkt blkt force-pushed the history-service-impl branch 2 times, most recently from c873780 to b3f5df7 Compare July 10, 2024 17:09
@blkt blkt merged commit 87416ec into main Jul 10, 2024
21 of 22 checks passed
@blkt blkt deleted the history-service-impl branch July 10, 2024 17:37
dmjb pushed a commit that referenced this pull request Jul 12, 2024
ListEvaluationHistory RPC returns a paginated list of evaluation
events based on the given cursor and filter.

Routines for managing a filter are implemented as well, in a way that
should be reusable within other endpoints supporting filters. This
implementation only allows and-joined predicates. Such predicates
allow only or-joined equality/inequality checks. Simply put, if a
filter entry starts with the exclamation mark, it is added to the
inequality check, while it is added to the equality check
otherwise. Finally, timerange based filtering is supported.

Routines for managing a cursor are implemented as well, but are not
intended to be generic. Cursors are tightly coupled with the
underlying extraction logic and are harder to refactor, and the
additional effort was not considered valuable at this time.

Fixes #3746
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement API endpoints for evaluation history
6 participants