-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include data_use
and data_category
metadata in upload
of access results
#3674
Conversation
816d02e
to
8d444f2
Compare
Passing run #2951 ↗︎
Details:
This comment has been generated by cypress-bot as a result of this project's GitHub integration settings. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #3674 +/- ##
==========================================
+ Coverage 87.09% 87.10% +0.01%
==========================================
Files 310 310
Lines 19033 19048 +15
Branches 2437 2438 +1
==========================================
+ Hits 16576 16592 +16
Misses 2028 2028
+ Partials 429 428 -1
☔ View full report in Codecov by Sentry. |
@galvana let me know if this looks good, based on what we discussed. (side note: do we have follow up issues that track the work of actually putting this data in the output package? if not, i'm happy to create those.) @pattisdr figured i'd tag you in as well, if you have a moment, to make sure you don't notice any red flags with the enhancement to privacy request execution and a new cache entry! |
starting review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good work, all comments are minor. I like the incremental nature of this PR, first collecting this, before we actually start using the new metadata in the output package.
77216e4
to
8565ca3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a quick turnaround! This looks good and is consistent with what we discussed. The only part we might have to change in the future is the structure of the data use and data category maps but we won't know the ideal structure until we finalize the design for DSR package (collection address as key instead of the data category for the data category map).
thanks @galvana, that makes sense! should be relatively easy to switch that around if/when needed. |
Closes #3266
Description Of Changes
We want to include
data_use
anddata_category
metadata as part of the output package for access requests. In order to do this, we need to make that metadata available to theupload()
functions that produce and serialize the output packages. This PR updates the request execution internals to pass those pieces of metadata to the appropriateupload()
function.The
data_category
metadata was already available as thedata_category_field_mapping
attribute on theDatasetGraph
that's used during privacy request execution -- the update here is simply to pass that along as another argument to theupload()
function.The
data_use
metadata is not something we'd been keeping track of at all in privacy request execution. A new data structure has been created to maintain that metadata - adict
that maps eachCollection
in the graph traversal to a set ofdata_use
s associated with theCollection
, where the association is determined by theCollection
-->ConnectionConfig
-->System
-->DataUse
relationship chain. Thatdict
is populated and stored in the redis cache with a key unique to the particular privacy request as request execution beings; the cache entry is then retrieved right beforeupload()
is called at the end of request execution.Code Changes
DatasetGraph.data_category_field_mapping
as an argument toupload()
upload()
Steps to Confirm
Pre-Merge Checklist
CHANGELOG.md