Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigquery Row-Level Deletes + Erase After on Database Collections #5293

Merged
merged 21 commits into from
Sep 25, 2024

Conversation

pattisdr
Copy link
Contributor

@pattisdr pattisdr commented Sep 17, 2024

Closes #PROD-2744
Closes #PROD-2745

Needs fideslang ethyca/fideslang#17

Description Of Changes

  1. Adds initial support for collection-level masking overrides via Bigquery and
  2. extends erase_after to work for database connectors not just saas connectors.
...
   collections:
      - name: address
        fides_meta:
          erase_after: [other_dataset.other_collection]
          masking_strategy_override:
            strategy: delete

Code Changes

  • Specifying a delete masking strategy override at the collection level on a bigquery dataset yaml will cause matching rows to be deleted instead of updated when running an erasure request. The collection-level override takes precedence over the policy override. Again this just takes affect for Bigquery only as a first iteration.
  • Extends erase_after functionality to be incorporated into the graph if specified for a dataset connector, not just a saas connector. This lets customers specify they want a specific node to run after another if necessary, while there is no intelligent ordering for erasure deletes.

Caution

  • Collections need FK's to facilitate deletes (and updates for that matter)
  • Erasure order is unchanged. If you require that one node be erased before another, use erase_after functionality
  • masking_strategy_overrrides at the collection level are ignored for all but bigquery integrations

Steps to Confirm

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation:
    • documentation complete, PR opened in fidesdocs
    • documentation issue created in fidesdocs
    • if there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md
  • For API changes, the Postman collection has been updated
  • If there are any database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!

Copy link

vercel bot commented Sep 17, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Sep 25, 2024 0:37am

@pattisdr pattisdr mentioned this pull request Sep 17, 2024
13 tasks
Copy link

cypress bot commented Sep 17, 2024

fides    Run #10122

Run Properties:  status check passed Passed #10122  •  git commit d8b0787ae5 ℹ️: Merge 992f9e9e989dc9d7a892b485c818a4b7c073b5c0 into 155e1fd8bcf545599af273b06a62...
Project fides
Branch Review refs/pull/5293/merge
Run status status check passed Passed #10122
Run duration 00m 39s
Commit git commit d8b0787ae5 ℹ️: Merge 992f9e9e989dc9d7a892b485c818a4b7c073b5c0 into 155e1fd8bcf545599af273b06a62...
Committer Dawn Pattison
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.
View all changes introduced in this branch ↗︎

… bigquery

- Seed existing bigquery table with employee record
- Updated employee dataset reference direction
- Confirm employee record exists before running DSR and confirm it's been removed afterwards
@pattisdr pattisdr marked this pull request as ready for review September 19, 2024 22:08
@pattisdr pattisdr added the run unsafe ci checks Runs fides-related CI checks that require sensitive credentials label Sep 19, 2024
@pattisdr pattisdr changed the title Bigquery Row-Level Deletes Bigquery Row-Level Deletes + Erase After on Database Collections Sep 23, 2024
data/dataset/bigquery_delete_override_test_dataset.yml Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
Comment on lines +877 to +905
def generate_delete(self, row: Row, client: Engine) -> Optional[Delete]:
"""Returns a SQLAlchemy DELETE statement for BigQuery. Does not actually execute the delete statement.

Used when a collection-level masking override is present and the masking strategy is DELETE.
"""
non_empty_primary_keys: Dict[str, Field] = filter_nonempty_values(
{
fpath.string_path: fld.cast(row[fpath.string_path])
for fpath, fld in self.primary_key_field_paths.items()
if fpath.string_path in row
}
)

valid = len(non_empty_primary_keys) > 0
if not valid:
logger.warning(
"There is not enough data to generate a valid DELETE statement for {}",
self.node.address,
)
return None

table = Table(
self.node.address.collection, MetaData(bind=client), autoload=True
)
pk_clauses: List[ColumnElement] = [
getattr(table.c, k) == v for k, v in non_empty_primary_keys.items()
]
return table.delete().where(*pk_clauses)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New method for Bigquery only to start to generate a DELETE statement for the relevant primary keys.

@erosselli erosselli self-requested a review September 23, 2024 21:13
@pattisdr
Copy link
Contributor Author

pattisdr commented Sep 23, 2024

Lots of failing tests, on first glance a lot aren't related, but let me go through and double check them -

increased failures in saas connector tests - I wonder if erase_after could be interacting? 👀 looking

EDIT: SaaS Connectors were flaky tests

Copy link
Contributor

@erosselli erosselli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Tested manually and it looks good

tests/ops/service/connectors/test_queryconfig.py Outdated Show resolved Hide resolved
tests/ops/service/connectors/test_queryconfig.py Outdated Show resolved Hide resolved
@pattisdr
Copy link
Contributor Author

Merging, I believe the failing tests are also on main

@pattisdr pattisdr merged commit 0e2db7c into main Sep 25, 2024
39 of 45 checks passed
@pattisdr pattisdr deleted the PROD-2744-row-deletes branch September 25, 2024 02:06
Copy link

cypress bot commented Sep 25, 2024

fides    Run #10128

Run Properties:  status check passed Passed #10128  •  git commit 0e2db7c28d: Bigquery Row-Level Deletes + Erase After on Database Collections (#5293)
Project fides
Branch Review main
Run status status check passed Passed #10128
Run duration 00m 38s
Commit git commit 0e2db7c28d: Bigquery Row-Level Deletes + Erase After on Database Collections (#5293)
Committer Dawn Pattison
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.
View all changes introduced in this branch ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run unsafe ci checks Runs fides-related CI checks that require sensitive credentials
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants