Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define rule instance table for Minder #3459

Merged
merged 7 commits into from
May 31, 2024
Merged

Conversation

dmjb
Copy link
Contributor

@dmjb dmjb commented May 30, 2024

Rule instances are currently stored as serialized protobuf structures in the database, grouped by entity type. A separate join table exists to track the relationship between a profile-entity and the rule types which are used. This arrangement leads to some rather awkward code and database queries making the profile-related code in Minder more complex than it should be.

This PR adds in a definition of a dedicated rule instance table. Subsequent PRs will migrate data over to it and then change the codebase to query this table instead of the current tables.

Summary

Provide a brief overview of the changes and the issue being addressed.
Explain the rationale and any background necessary for understanding the changes.
List dependencies required by this change, if any.

Fixes #(related issue)

Change Type

Mark the type of change your PR introduces:

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

Outline how the changes were tested, including steps to reproduce and any relevant configurations.
Attach screenshots if helpful.

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

@dmjb dmjb requested a review from a team as a code owner May 30, 2024 08:33
@coveralls
Copy link

coveralls commented May 30, 2024

Coverage Status

coverage: 52.393% (+0.01%) from 52.382%
when pulling 26faef0 on rule-instance-table-migration
into 330ccec on main.

@dmjb dmjb force-pushed the rule-instance-table-migration branch from 5c3348f to df59d14 Compare May 30, 2024 16:32
$1,
$2,
sqlc.arg(contextual_rules)::jsonb,
FALSE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant given that the default value is FALSE for this column, but I want to set it explicitly to remind myself to change it in the next PR.

evankanderson
evankanderson previously approved these changes May 30, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like they shouldn't be part of this PR -- did we miss an autogeneration step earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, this has been moved to another branch.

params JSONB NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
-- equivalent to constraints enforced by rule validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also creates an index implicitly. 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is at least one query in my next PR which queries by these three columns. I also need it for an upsert query.


BEGIN;

CREATE TABLE IF NOT EXISTS rule_instances(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we like the name rule_instances? If I understand correctly, each row represents a single rule definition in a profile -- is that correct?

What do you think of the name profile_rules for this table?

Copy link
Contributor Author

@dmjb dmjb May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Rule instances" is the term I've heard used most often within the team to describe the rules which hang off a profile, so I think I'll stick to that for now.

Comment on lines 23 to 24
def JSONB NOT NULL,
params JSONB NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR probably, but what's the difference between def and params? In either case, these match a schema from rule_type_id, correct? It might be worth a comment to that effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they are parameters to the def and params blocks in a rule type... will add a comment to that effect.

Comment on lines +25 to +26
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want these separately from created_at/updated_at on profiles? If so, how do we handle these fields when someone PUTs a Profile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently store created/updated at in the entity_profiles table, so I decided to carry it forward here.

When someone PUTs a profile, the rule instances in the updated profile are upserted, and the updated time is updated on conflict... that behaviour will be carried forward in the queries for this table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question here was:

Are there cases where the created_at or updated_at fields on different rule_instances with the same profile_id will have different values?

It sounds like the answer is "yes" if I start with a Profile with 2 rules, then add a third rule later.

id UUID NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
profile_id UUID NOT NULL REFERENCES profiles(id) ON DELETE CASCADE,
rule_type_id UUID NOT NULL REFERENCES rule_type(id),
name TEXT NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to be a pain, but is name case-sensitive?

Copy link
Contributor Author

@dmjb dmjb May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is case sensitive in the existing code, and so I preserved that behaviour. Keep in mind that the rule names are A) optional and autogenerated if they are not present and B) are far less commonly used in the API contract than rule type or profile names, so it is less likely that someone will type a rule name with the wrong case.

created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
-- equivalent to constraints enforced by rule validation
UNIQUE (profile_id, entity_type, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - a rule can apply to more than one entity type, and we allow the same name if it's applying to a different type of entity? (Is that what we want?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combination of entity type + rule name is unique for a given profile - you can reuse the same rule name across different entity type blocks in the YAML. This reflects the existing behaviour/validation rules for rule names.

CREATE INDEX idx_rule_instances_profile_entity ON rule_instances(profile_id, entity_type);

-- this will be used for migration purposes
ALTER TABLE entity_profiles ADD COLUMN migrated BOOL DEFAULT FALSE NOT NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so we're going to use this to mark whether to look in rule_instances or in the column data for profiles for the rule definitions? SGTM, that will allow us to migrate without too many dual-writes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The next PR will begin dual writing, and will set this migrated flag to true when writing to the old tables to signify that they do not need to be migrated. The PR after that will run a DB migration to move all rule instances to the new table.


BEGIN;

CREATE TABLE IF NOT EXISTS rule_instances(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reuse entity_profile_rules that already contains the link between an entity_profile (which links to a profile) and a rule_type_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we have a whole bunch of complexity throughout the code which results from storing each rule instance in a JSONB array instead of having a table of the individual rule instances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the JSON storage issue is understandable. However, that is not the table I'm talking about. We already have a table that is meant for rule instances, where we store the link between profiles and rule types. And that is entity_profie_rules. we could leverage that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmjb this is the table I'm talking about https://github.com/stacklok/minder/blob/main/internal/db/models.go#L467

For each profile in an entity's context we store the instantiation of a rule type. This is done to prevent deleting rule types when they're instantiated by profile. Instead of creating a new table, why not use what we already have and enhance it with the def and params information?

Copy link
Contributor Author

@dmjb dmjb May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A profile may contain more than one instance of a rule type. In the new table, (profile_id, entity_type, rule_type_id) is not unique - I had tried to create this constraint first and then discovered during testing that this is an incorrect assumption.

However, in entity_profile_rules, (entity_profile_id, rule_type_id) is unique. N different rule instances with the same rule_type_id will create a single row in entity_profile_rules. Since I want to dual write to my new structure and the old structure until we migrate over, I don't think it's a good idea to change the uniqueness constraint on entity_profile_rules so that we can reuse that table.

dmjb added 3 commits May 31, 2024 07:53
Rule instances are currently stored as serialized protobuf structures in
the database, grouped by entity type. A separate join table exists to
track the relationship between a profile-entity and the rule types which
are used. This arrangement leads to some rather awkward code and
database queries making the profile-related code in Minder more complex
than it should be.

This PR adds in a definition of a dedicated rule instance table.
Subsequent PRs will migrate data over to it and then change the codebase
to query this table instead of the current tables.
@dmjb dmjb force-pushed the rule-instance-table-migration branch from 9ed0eb3 to 1e7d5fb Compare May 31, 2024 06:53
@dmjb dmjb force-pushed the rule-instance-table-migration branch from c50d079 to 2a5da5c Compare May 31, 2024 08:35
@dmjb dmjb merged commit 19a62a2 into main May 31, 2024
21 checks passed
@dmjb dmjb deleted the rule-instance-table-migration branch May 31, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants