-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PROD-2486 Add Dynamic Erasure Email Integration #5226
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
fides Run #9864
Run Properties:
|
Project |
fides
|
Branch Review |
refs/pull/5226/merge
|
Run status |
Passed #9864
|
Run duration | 00m 37s |
Commit |
25ae3fee73 ℹ️: Merge d5db540557e2bd39abb76c686e1b56aaf7024df8 into 77dabdd4202b35247be5cb3d6afd...
|
Committer | erosselli |
View all properties for this run ↗︎ |
Test results | |
---|---|
Failures |
0
|
Flaky |
0
|
Pending |
0
|
Skipped |
0
|
Passing |
4
|
View all changes introduced in this branch ↗︎ |
...ui/src/features/datastore-connections/system_portal_config/forms/ConnectorParametersForm.tsx
Outdated
Show resolved
Hide resolved
...ui/src/features/datastore-connections/system_portal_config/forms/ConnectorParametersForm.tsx
Outdated
Show resolved
Hide resolved
clients/admin-ui/src/types/api/models/ConnectionConfigurationResponse.ts
Outdated
Show resolved
Hide resolved
@@ -627,6 +658,48 @@ class QueryStringWithoutTuplesOverrideQueryConfig(SQLQueryConfig): | |||
Generates SQL valid for connectors that require the query string to be built without tuples. | |||
""" | |||
|
|||
# Overrides SQLQueryConfig.generate_raw_query | |||
def generate_raw_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: reuse code from generate_query_without_tuples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there still some code here that could be consolidated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did a nice job tracking down all the individual pieces to get something working end-to-end @erosselli - and your self-review of this PR is great. But big picture, I have several concerns with how we're slotting in this new connector type and would consider refactoring so this doesn't require so much special handling. This might have to be a follow-up due to time constraints. Generally I think the dynamic erasure email connector is doing too much.
My primary issues:
- Separation of concerns: Dynamic Email Connector is doing all the heavy lifting. this includes querying the Postgres Custom Field Collection, when that is typically taken care of as part of the Postgres Custom Field Collection upstream node. Each node is typically responsible for querying itself.
- Lots of special casing given that Postgres Custom Field Collection is not reachable via custom field
- Dynamic Email Connector is being treated as a bulk email connector. Bulk emails are sent once weekly, combining the last week's users that need their info deleted into a single email. However, with dynamic emails, I don't think we get a lot of benefit out of postponing these to fire once weekly. Have we considered firing these as part of the erasure graph itself?
Here's how I might approach this connector instead:
- Make collections like the Postgres Custom Field Collection reachable by custom field, not just identity data. That way the Postgres Custom Field Collection could be a proper part of the graph, with upstream nodes (the root collection). I definitely get that this is easier said than done, but avoids a lot of the special casing.
- Also add the Dynamic Email Collection to the graph. We typically need a DatasetConfig/Dataset for this. One way to achieve this would be when setting up the integration, a dataset is automatically created with just a single recipient_email_address field that adds fides_meta.references to point to the postgres collection with the custom field. - Then our graph would look like: Root -> Postgres Custom Field Collection -> Dynamic Email Collection -> Terminator.
- Unlike the other email connectors, the dynamic email connector could look more similar to a saas connector/database connector and have access/erasure methods. The erasure method itself would use the access data already retrieved by upstream Postgres Custom Field Collection to fire off a single erasure email instead of being one that is run at the end in the "bulk email" section. I know I'm hand waving over some of the complexity here, since there is no "access" step here -
...ui/src/features/datastore-connections/system_portal_config/forms/ConnectorParametersForm.tsx
Outdated
Show resolved
Hide resolved
src/fides/api/schemas/connection_configuration/connection_secrets_dynamic_erasure_email.py
Outdated
Show resolved
Hide resolved
revision = '9de4bb76307a' | ||
down_revision = 'ffee79245c9a' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a note to make sure to double check your down_revision before you merge in case someone else has added a new migration in the interim - in that case you'd adjust your down_revision to their new migration
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
@erosselli thanks for discussing next steps - as part of this increment, can you add code comments to the special-cased locations, where it is temporary until we get custom fields treated in more of a first class way? |
48b8272
to
d88defe
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5226 +/- ##
==========================================
- Coverage 86.41% 86.29% -0.12%
==========================================
Files 362 365 +3
Lines 22792 22992 +200
Branches 3060 3090 +30
==========================================
+ Hits 19695 19842 +147
- Misses 2538 2583 +45
- Partials 559 567 +8 ☔ View full report in Codecov by Sentry. |
if not dataset_config: | ||
logger.error( | ||
"DatasetConfig with key '{}' not found. Skipping erasure email send for connector: '{}'.", | ||
dataset_key, | ||
self.configuration.key, | ||
) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pattisdr @galvana In these cases where the connector is misconfigured, do we want to mark all the privacy requests as error or just skip them? I think we shouldn't really hit this case since we validate the config when creating the integration, but wanted to check what the expected behavior would be here. Right now I'm doing nothing which is probably not the right solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would raise an exception here just in case so the customer sees the failed privacy requests and we can work on addressing the connector instead of having this issue go undetected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think raising the exception is enough for the error to be visible , I think I also need to add an ExecutionLog for each privacy request. I'll do that as well to ensure the privacy request shows as failed. Also if I raise the exception it seems like the task keeps retrying which is probably not what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right we're outside of that typical flow
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
] | ||
|
||
|
||
class BaseErasureEmailConnector(BaseEmailConnector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the logic here was directly taken from the GenericEmailErasureConnector, just made the typing a bit more flexible so we can inherit from it in both cases
c866b4a
to
708b055
Compare
class ProcessedConfig(NamedTuple): | ||
graph_dataset: GraphDataset | ||
connector: BaseConnector | ||
collection_address: CollectionAddress | ||
collection_data: Any | ||
email_field: str | ||
vendor_field: str | ||
dsr_field_to_collection_field: Dict[str, str] | ||
|
||
|
||
class BatchedIdentitiesData(NamedTuple): | ||
email_address: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are just some helper types I added to make the code easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thorough error handling here @erosselli! good job working through the tradeoffs of this very nonstandard connector. My comments are fairly minor. first thing, I'd get your migration updated with main so tests can run on this branch
...des/api/alembic/migrations/versions/9de4bb76307a_add_dynamic_erasure_email_connector_type.py
Outdated
Show resolved
Hide resolved
src/fides/api/schemas/connection_configuration/connection_secrets_email.py
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @erosselli, thanks for all the extra docstrings and clarification with this nonstandard connector added on this iteration- for a PR this size, I'd do one more skim through to make sure there's no code you're accidentally committing, check your downrev one last time, etc.
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/schemas/connection_configuration/connection_secrets_email.py
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
src/fides/api/service/connectors/dynamic_erasure_email_connector.py
Outdated
Show resolved
Hide resolved
fides Run #9865
Run Properties:
|
Project |
fides
|
Branch Review |
main
|
Run status |
Passed #9865
|
Run duration | 00m 39s |
Commit |
2e03b9b9c3: PROD-2486 Add Dynamic Erasure Email Integration (#5226)
|
Committer | erosselli |
View all properties for this run ↗︎ |
Test results | |
---|---|
Failures |
0
|
Flaky |
0
|
Pending |
0
|
Skipped |
0
|
Passing |
4
|
View all changes introduced in this branch ↗︎ |
Closes PROD-2486
Description Of Changes
Main idea
This introduces a new type of integration called Dynamic Erasure Email integration. This integration is very similar to our existing Generic Erasure Email integration, with the key difference being that the recipient email address is not set as part of the connection config, but rather calculated at runtime using the privacy request's custom request fields. The connection config has an
recipient_email_address
field that is a reference to the dataset field that contains the email address.For example, let's suppose we have the following table
tenants
:The idea behind this integration is that we set the
recipient_email_address
field to point to theemail
field of thetenants
collection, e.gtenants_dataset.tenants.email
, and we provide atenant_id
as part of the erasure request, which will then be used to look up the corresponding email address.The
third_party_vendor_name
is also a dataset reference field, since the vendor name will depend on the email address.Relevant changes
fides_meta
of the field, calledcustom_request_field
. This depends on PROD-2486 Add custom_request_field to FidesMeta fideslang#13custom_request_field
attribute are ignored by the reachability checkNext Steps
Ideally we want to fully support custom request fields as part of the DSR graph. Once we implement that, a lot of the hacky workarounds of this PR won't be necessary. The idea here was to release this within 1 sprint and have it targeted to a customer request, but we do plan to fully support this feature in the future.
Code Changes
dynamic_email_erasure
and create a newDynamicErasureEmailConnector
GenericErasureEmailConnector
into a new base classBaseErasureEmailConnector
, that has logic common to bothGenericErasureEmailConnector
andDynamicErasureEmailConnector
. They both inherit from this class.SQLConnector
andSnowflakeConnector
with a newexecute_standalone_retrieval_query
that allows us to execute a query without caring about the incoming/outgoing edges of the node. This is used to execute the query to retrieve the email address using the provided custom fieldsRelated Docs PR: https://github.com/ethyca/fidesdocs/pull/413/files
Steps to Confirm
Environment setup
fides.toml
file in the[execution]
section:initiate_scheduled_batch_email_send
function insrc/fides/api/service/privacy_request/email_batch_service.py
so the job runs more than once a week, e.g every 30s:nox -s "fides_env(test)"
postgres_example
database in the SQL client of your choice (or open a psql shell in the container) and edit the second row in thedynamic_email_address_config
table so that theemail_address
column contains your email address. This is so you'll actually receive the emails.PUT /api/v1/messaging/default
endpoint with the following payload:You can get the domain from our 1password mailgun test credentials.
6. From the Swagger API (or Postman) , call the
PUT /api/v1/messaging/default/{service_type}/secret
endpoint, withservice_type
set tomailgun
. You can get the api key from the same 1password credentials as before.Testing the integration
POST /api/v1/privacy-request
endpoint with the following payload:2.1 The setup for this to work is done in the sample project environment, but you can go to the Systems list in the Admin UI and see for yourself. There's a system for the new Dynamic Erasure Email Integration, which has the dataset reference that points to the Postgres Integration for the
dynamic_email_address_config
collection.Testing error cases
Creating the integration from scratch
Integrations
tab. ClickDynamic Erasure Email
.2.1 Reference is not dot-delimited, e.g
asingleword
=> frontend form shows an error specifying correct format2.2 Reference is dot-delimited, but only has 2 parts, e.g
collection.field
=> frontend form shows an error2.3 Reference field references a fake dataset, e.g
fake_dataset.collection.field
=> backend returns error which is displayed on the frontend2.4
recipient_email_address
andthird_party_vendor_name
reference different datasets => backend returns error which is displayed on the frontend2.5
recipient_email_address
andthird_party_vendor_name
reference different collections from the same dataset => backend returns error which is displayed on the frontendRunning a misconfigured integration
datasetconfig
table and edit thefides_key
of thepostgres_example_custom_request_field_dataset
row, e.g change it totest
.ExecutionLog
in theActivity Timeline
section that shows the error detailPrivacy request with no email lookup results
Creating and approving a privacy request with the following payload
should cause the privacy request to error, and an ExecutionLog with the error details added.
Privacy request with multiple results for email lookup
Creating and approving a privacy request with the following payload
should cause the privacy request to error, and an ExecutionLog with the error details added.
Pre-Merge Checklist
CHANGELOG.md
main
downgrade()
migration is correct and works