Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include data_use and data_category metadata in upload of access results #3674

Merged
merged 6 commits into from
Jun 28, 2023

Conversation

adamsachs
Copy link
Contributor

@adamsachs adamsachs commented Jun 26, 2023

Closes #3266

Description Of Changes

We want to include data_use and data_category metadata as part of the output package for access requests. In order to do this, we need to make that metadata available to the upload() functions that produce and serialize the output packages. This PR updates the request execution internals to pass those pieces of metadata to the appropriate upload() function.

The data_category metadata was already available as the data_category_field_mapping attribute on the DatasetGraph that's used during privacy request execution -- the update here is simply to pass that along as another argument to the upload() function.

The data_use metadata is not something we'd been keeping track of at all in privacy request execution. A new data structure has been created to maintain that metadata - a dict that maps each Collection in the graph traversal to a set of data_uses associated with the Collection, where the association is determined by the Collection --> ConnectionConfig --> System --> DataUse relationship chain. That dict is populated and stored in the redis cache with a key unique to the particular privacy request as request execution beings; the cache entry is then retrieved right before upload() is called at the end of request execution.

Code Changes

  • pass along the DatasetGraph.data_category_field_mapping as an argument to upload()
  • populate a "data use map" and store it in the redis cache, before request execution, with a key that's unique to the privacy request
  • retrieve the "data use map" that's stored in the redis cache and pass it along as an argument to upload()

Steps to Confirm

  • hard to confirm this functionality manually until we actually make use of the new metadata in the output package
  • in the meantime, we should just confirm manually that request execution does not regress with these additions!

Pre-Merge Checklist

@adamsachs adamsachs force-pushed the asachs/fides-3266 branch from 816d02e to 8d444f2 Compare June 26, 2023 16:10
@cypress
Copy link

cypress bot commented Jun 26, 2023

Passing run #2951 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Merge f4e8b98 into 0337c66...
Project: fides Commit: 8257cead6e ℹ️
Status: Passed Duration: 00:47 💡
Started: Jun 28, 2023 6:17 PM Ended: Jun 28, 2023 6:18 PM

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings.

@codecov
Copy link

codecov bot commented Jun 26, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 🎉

Comparison is base (0337c66) 87.09% compared to head (f4e8b98) 87.10%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3674      +/-   ##
==========================================
+ Coverage   87.09%   87.10%   +0.01%     
==========================================
  Files         310      310              
  Lines       19033    19048      +15     
  Branches     2437     2438       +1     
==========================================
+ Hits        16576    16592      +16     
  Misses       2028     2028              
+ Partials      429      428       -1     
Impacted Files Coverage Δ
.../service/privacy_request/request_runner_service.py 77.17% <ø> (ø)
src/fides/api/graph/graph.py 94.44% <100.00%> (+0.05%) ⬆️
src/fides/api/models/privacy_request.py 96.30% <100.00%> (+0.06%) ⬆️
...es/api/service/storage/storage_uploader_service.py 95.00% <100.00%> (+0.12%) ⬆️
src/fides/api/task/graph_task.py 93.15% <100.00%> (+0.40%) ⬆️
src/fides/api/tasks/storage.py 77.50% <100.00%> (+0.28%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@adamsachs adamsachs marked this pull request as ready for review June 26, 2023 18:30
@adamsachs adamsachs requested review from galvana and pattisdr June 26, 2023 18:30
@adamsachs
Copy link
Contributor Author

adamsachs commented Jun 26, 2023

@galvana let me know if this looks good, based on what we discussed. (side note: do we have follow up issues that track the work of actually putting this data in the output package? if not, i'm happy to create those.)

@pattisdr figured i'd tag you in as well, if you have a moment, to make sure you don't notice any red flags with the enhancement to privacy request execution and a new cache entry!

@pattisdr
Copy link
Contributor

starting review!

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good work, all comments are minor. I like the incremental nature of this PR, first collecting this, before we actually start using the new metadata in the output package.

src/fides/api/graph/graph.py Show resolved Hide resolved
src/fides/api/task/graph_task.py Show resolved Hide resolved
src/fides/api/service/storage/storage_uploader_service.py Outdated Show resolved Hide resolved
src/fides/api/service/storage/storage_uploader_service.py Outdated Show resolved Hide resolved
tests/ops/task/test_graph_task.py Show resolved Hide resolved
src/fides/api/models/privacy_request.py Show resolved Hide resolved
src/fides/api/task/graph_task.py Show resolved Hide resolved
src/fides/api/task/graph_task.py Show resolved Hide resolved
@adamsachs adamsachs force-pushed the asachs/fides-3266 branch from 77216e4 to 8565ca3 Compare June 28, 2023 14:47
Copy link
Contributor

@galvana galvana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a quick turnaround! This looks good and is consistent with what we discussed. The only part we might have to change in the future is the structure of the data use and data category maps but we won't know the ideal structure until we finalize the design for DSR package (collection address as key instead of the data category for the data category map).

@adamsachs
Copy link
Contributor Author

What a quick turnaround! This looks good and is consistent with what we discussed. The only part we might have to change in the future is the structure of the data use and data category maps but we won't know the ideal structure until we finalize the design for DSR package (collection address as key instead of the data category for the data category map).

thanks @galvana, that makes sense! should be relatively easy to switch that around if/when needed.

@adamsachs adamsachs merged commit 65a72ff into main Jun 28, 2023
@adamsachs adamsachs deleted the asachs/fides-3266 branch June 28, 2023 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update run_access_request to include data uses and data category metadata
3 participants