Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate remaining data categories #5073

Merged
merged 15 commits into from
Jul 28, 2024

Conversation

galvana
Copy link
Contributor

@galvana galvana commented Jul 9, 2024

Closes PROD-2210

Description Of Changes

The following data categories weren't migrated during the last Fideslang migration:

  • user.biometric_health -> user.biometric.health
  • user.credentials.biometric_credentials -> user.authorization.biometric
  • user.credentials.password -> user.authorization.password

Additionally, the default access and erasure polices could be missing the following data categories if they were created before Fideslang 2.0 was released.

  • user.behavior
  • user.content
  • user.privacy_preferences

This PR migrates the missed data categories for all policies but only adds the missing data categories to the default access and erasure policies. We don't want to make any assumptions about any policies that were created by the users.

Code Changes

  • Extracted shared logic from the last Fideslang migration so it could be reused for this new migration
  • Added two new migration functions update_default_dsr_policies and remove_conflicting_rule_targets

Steps to Confirm

  • Start Fides with nox -s dev and connect to the Postgres database.
  • Run the SQL commands from setup_script.txt. This will reset the default policies to a state before the previous Fideslang 2.0 migration.
  • Run this query to verify the current state of key data categories
SELECT r.key as rule_key, rt.data_category FROM ruletarget rt INNER JOIN rule r ON rt.rule_id = r.id WHERE data_category IN ('user.behavior', 'user.content', 'user.privacy_preferences', 'user.biometric.health', 'user.authorization.biometric', 'user.authorization.password', 'user.biometric_health', 'user.credentials.biometric_credentials', 'user.credentials.password', 'user.biometric') order by rule_key, data_category;

We should see these records containing the old data categories + user.biometric which already existed in Fideslang 1.4

          rule_key           |             data_category              
-----------------------------+----------------------------------------
 default_access_policy_rule  | user.biometric
 default_access_policy_rule  | user.biometric_health
 default_access_policy_rule  | user.credentials.biometric_credentials
 default_access_policy_rule  | user.credentials.password
 default_erasure_policy_rule | user.biometric
 default_erasure_policy_rule | user.biometric_health
 default_erasure_policy_rule | user.credentials.biometric_credentials
 default_erasure_policy_rule | user.credentials.password
(8 rows)
  • Connect to the Fides container with nox -s shell and navigate to the Alembic directory cd src/fides/api/alembic.
  • Run alembic downgrade -1. This won't reset anything (that was done by the commands in step 2) but it will allow us to run the migration again.
  • Run alembic upgrade head and take note of the messages in the logs.
2024-07-26 00:11:48.717 | INFO     | a6d9cdfcc7dc_migrate_remaining_data_categories_py:upgrade:65 - Removing old default data categories
2024-07-26 00:11:48.719 | INFO     | a6d9cdfcc7dc_migrate_remaining_data_categories_py:upgrade:68 - Upgrading additional Privacy Declarations for Fideslang 2.0
2024-07-26 00:11:48.720 | INFO     | a6d9cdfcc7dc_migrate_remaining_data_categories_py:upgrade:71 - Upgrading additional Policy Rules for Fideslang 2.0
2024-07-26 00:11:48.720 | INFO     | a6d9cdfcc7dc_migrate_remaining_data_categories_py:upgrade:74 - Upgrading additional Data Categories in Datasets
2024-07-26 00:11:48.721 | INFO     | a6d9cdfcc7dc_migrate_remaining_data_categories_py:upgrade:77 - Upgrading additional Data Categories in System egress/ingress
2024-07-26 00:11:48.722 | INFO     | a6d9cdfcc7dc_migrate_remaining_data_categories_py:upgrade:80 - Updating additional Rule Targets
2024-07-26 00:11:48.730 | INFO     | a6d9cdfcc7dc_migrate_remaining_data_categories_py:upgrade:83 - Upgrading additional Taxonomy Items for Fideslang 2.0
2024-07-26 00:11:48.731 | INFO     | a6d9cdfcc7dc_migrate_remaining_data_categories_py:upgrade:86 - Adding new rule targets to default policies
2024-07-26 00:11:48.734 | INFO     | fides.api.alembic.migrations.helpers.fideslang_migration_functions:update_default_dsr_policies:451 - Inserted new rule target: user.behavior for rule default_access_policy_rule
2024-07-26 00:11:48.735 | INFO     | fides.api.alembic.migrations.helpers.fideslang_migration_functions:update_default_dsr_policies:451 - Inserted new rule target: user.content for rule default_access_policy_rule
2024-07-26 00:11:48.736 | INFO     | fides.api.alembic.migrations.helpers.fideslang_migration_functions:update_default_dsr_policies:451 - Inserted new rule target: user.privacy_preferences for rule default_access_policy_rule
2024-07-26 00:11:48.739 | INFO     | fides.api.alembic.migrations.helpers.fideslang_migration_functions:update_default_dsr_policies:451 - Inserted new rule target: user.behavior for rule default_erasure_policy_rule
2024-07-26 00:11:48.741 | INFO     | fides.api.alembic.migrations.helpers.fideslang_migration_functions:update_default_dsr_policies:451 - Inserted new rule target: user.content for rule default_erasure_policy_rule
2024-07-26 00:11:48.742 | INFO     | fides.api.alembic.migrations.helpers.fideslang_migration_functions:update_default_dsr_policies:451 - Inserted new rule target: user.privacy_preferences for rule default_erasure_policy_rule
2024-07-26 00:11:48.742 | INFO     | fides.api.alembic.migrations.helpers.fideslang_migration_functions:update_default_dsr_policies:461 - The default policies have been updated with new data categories
2024-07-26 00:11:48.742 | INFO     | a6d9cdfcc7dc_migrate_remaining_data_categories_py:upgrade:89 - Removing conflicting rule targets from all erasure policies
2024-07-26 00:11:48.745 | INFO     | fides.api.alembic.migrations.helpers.fideslang_migration_functions:remove_conflicting_rule_targets:389 - Removing conflicting rule target user.biometric.health for rule default_erasure_policy_rule
  • Run the verification query again
SELECT r.key as rule_key, rt.data_category FROM ruletarget rt INNER JOIN rule r ON rt.rule_id = r.id WHERE data_category IN ('user.behavior', 'user.content', 'user.privacy_preferences', 'user.biometric.health', 'user.authorization.biometric', 'user.authorization.password', 'user.biometric_health', 'user.credentials.biometric_credentials', 'user.credentials.password', 'user.biometric') order by rule_key, data_category;

We should see this:

  • the new data categories (user.behavior,user.content,user.privacy_preference)
  • the migrated data categories (user.authorization.biometric, user.authorization.password, user.biometric.health)
  • the conflicting user.biometric.health data category removed from the default_erasure_policy_rule but not default_access_policy_rule
          rule_key           |        data_category         
-----------------------------+------------------------------
 default_access_policy_rule  | user.authorization.biometric
 default_access_policy_rule  | user.authorization.password
 default_access_policy_rule  | user.behavior
 default_access_policy_rule  | user.biometric
 default_access_policy_rule  | user.biometric.health
 default_access_policy_rule  | user.content
 default_access_policy_rule  | user.privacy_preferences
 default_erasure_policy_rule | user.authorization.biometric
 default_erasure_policy_rule | user.authorization.password
 default_erasure_policy_rule | user.behavior
 default_erasure_policy_rule | user.biometric
 default_erasure_policy_rule | user.content
 default_erasure_policy_rule | user.privacy_preferences
(13 rows)

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Issue Requirements are Met
  • Update CHANGELOG.md
  • If there are any database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!

Copy link

vercel bot commented Jul 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Jul 28, 2024 1:48am

Copy link

cypress bot commented Jul 9, 2024

Passing run #9175 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Merge 957fedb into 7977269...
Project: fides Commit: 2b77ff417d ℹ️
Status: Passed Duration: 00:35 💡
Started: Jul 28, 2024 1:59 AM Ended: Jul 28, 2024 2:00 AM

Review all test suite changes for PR #5073 ↗︎

Copy link

codecov bot commented Jul 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.56%. Comparing base (cd35f63) to head (a05492d).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5073   +/-   ##
=======================================
  Coverage   86.56%   86.56%           
=======================================
  Files         357      357           
  Lines       22349    22349           
  Branches     2954     2954           
=======================================
+ Hits        19346    19347    +1     
  Misses       2480     2480           
+ Partials      523      522    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change here is that I'm pulling out the shared functions



class TestDataCategoryMigrationFunctions:
def test_remove_conflicting_rule_targets(self, db):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a straightforward test with one "bad" data category. Let me know if there are any additional scenarios I should cover.

@galvana galvana marked this pull request as ready for review July 16, 2024 22:30
@galvana galvana requested a review from adamsachs July 16, 2024 22:30
@galvana galvana requested review from pattisdr and removed request for adamsachs July 23, 2024 16:03
@pattisdr
Copy link
Contributor

Starting review -

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this needed cleanup @galvana. my main suggestion is around not using the sqlalchemy models but raw sql in the data migration in your two new functions -

Comment on lines 417 to 423
existing_target = RuleTarget.filter(
db=db,
conditions=(
(RuleTarget.rule_id == rule.id)
& (RuleTarget.data_category == data_category)
),
).first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally leery of using sqlalchemy models in migrations. If this model gets updated in the future, old migrations will start using this new model, with fields that don't yet exist on it, which can cause issues. I'd consider refactoring to not use the sqlalchemy models directly, like the other functions that were copied to this file.

Comment on lines 366 to 368
erasure_rules = Rule.filter(
db=db, conditions=(Rule.action_type == ActionType.erasure)
).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd refactor to not use the Rule model here, and write raw sql queries instead

)
db.delete(rule_target)

db.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try to avoid using this commit()- it shouldn't be needed, or should be rewritten so it isn't needed - see the other functions in this file.

I think the way we run migrations, it's all wrapped in a single transaction, so all migration scripts are run as part of one transaction and the commit happens at the end. We've had issues where say we're running migrations and something fails or takes too long and we have to bail in the middle - it's pretty straightforward to stop and roll back because nothing was actually committed. We could get in a weird state by deviating from this pattern where we get some of the migration committed but not all of it in the event of a failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll remove this commit and move it to the test instead, that way the context this function is running can take care of when to commit (either Alembic or the tests)

Copy link
Contributor Author

@galvana galvana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for another pass! I updated the manual testing steps and the setup_script.txt so make sure you have the latest version

)
db.delete(rule_target)

db.commit()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll remove this commit and move it to the test instead, that way the context this function is running can take care of when to commit (either Alembic or the tests)

@pattisdr
Copy link
Contributor

Starting re-review!

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK i've tested and reviewed the code, this looks good to me. I think we did backups or similar last time before this migration, I'd recommend the same here, since we can't roll back and it touches a lot of places in the database.

I also haven't independently verified if we're changing all the right data category locations/are there any we shouldn't be updating/any we've forgotten -

text("DELETE FROM ruletarget WHERE id IN :target_ids"),
{"target_ids": tuple(target_ids)},
)
logger.info(f"Removed {len(target_ids)} conflicting rule targets")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, and you're only doing this for erasure rules, I was confused in testing why a conflicting access rule target was hanging around.

@galvana galvana merged commit 7d2ed9a into main Jul 28, 2024
13 of 14 checks passed
@galvana galvana deleted the PROD-2210-migrate-additional-fideslang-data-categories branch July 28, 2024 01:48
Copy link

cypress bot commented Jul 28, 2024

Passing run #9176 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Migrate remaining data categories (#5073)
Project: fides Commit: 7d2ed9a0e3
Status: Passed Duration: 00:36 💡
Started: Jul 28, 2024 1:59 AM Ended: Jul 28, 2024 2:00 AM

Review all test suite changes for PR #5073 ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants