Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymisation and anonymity testing #75

Open
sbfnk opened this issue Jun 27, 2023 · 4 comments
Open

Anonymisation and anonymity testing #75

sbfnk opened this issue Jun 27, 2023 · 4 comments

Comments

@sbfnk
Copy link

sbfnk commented Jun 27, 2023

A fundamental barrier for sharing linelist data for further analysis/processing is the risk of identification of individuals, with substantial ethical and, potentially, legal implications. I wonder if linelist could help mitigate this risk by providing tools for users to help with ensuring none of the data contained is identifiable.

I can see two potential functions that linelist could provide:

  1. A function to assess the re-identification risk, e.g. calculating its k-anonymity
  2. Some support to reduce re-identification risk, e.g. by replacing a column or set of columns with a unique identifier.
@bahadzie bahadzie self-assigned this Jul 3, 2023
@Bisaloo
Copy link
Member

Bisaloo commented Jul 10, 2023

Thanks for starting this conversation!

I think this will be partly addressed by the coming Privacy Enhancing Techniques challenge. The data type will be slightly different but the methods can probably be used here as well.

In terms of scope, I believe this should live in a different package. The linelist package should only define the linelist object format, and the basic methods to manipulate it. Any other complex operation on linelist objects should likely live in a separate package.

@Bisaloo
Copy link
Member

Bisaloo commented Jul 25, 2023

Useful related resource: https://osf.io/xpj38/

@Bisaloo
Copy link
Member

Bisaloo commented Oct 31, 2023

Thanks again for the suggestion but I've thought about this more and I'm convinced this is outside the scope of linelist. I'm happy to collaborate on a separate package that would be interoperable with linelist and focus on anonymisation.

Some existing resources have been shared in WHO-Collaboratory/collaboratory-epipipeline-community#12 but please do open a new thread in the discussion board if you believe there are still gaps in the ecosystem or that it would be worthwhile to provide an alternative.

@Bisaloo Bisaloo closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2023
@Bisaloo
Copy link
Member

Bisaloo commented Jul 8, 2024

Thinking again about this and based on the feedback during the DPGA submission, I think there would be value to add a couple of extra lines in make_linelist() to warn users if they are working with data with re-identification risk (via k-anonymity testing).

I still think anonymisation is out of scope but a warning will be nice.

@Bisaloo Bisaloo reopened this Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants