Re-Evaluate Anonymisation and Security Measure names for Correctness #15

coolharsh55 · 2021-06-24T21:14:05Z

| Migrated ISSUE-33: The categorisation of Pseudoanonymisation and Encryption is not (semantically) correct

State: RAISED
Raised by: Harshvardhan J. Pandit
Opened on: 2019-11-26
Description: (from presentation to Kantara CISWG) Anonymisation is a subclass of Pseudoanonymisation which is conflicting in semantics as it specifies anonymisation is a type of pseudoanonymisation, which might not be intended. Also, Pseudoanonymisation and Encryption should not be grouping together (as a concept).
Reporter: Harsh
Notes: suggested to start a discussion on this issue.

mayaborges · 2022-01-10T10:55:42Z

I agree that Anonymisation should not be a subclass of Pseudoanonymisation, given that data cannot be both anonymised and pseudoanynomised. It could be argued that Anonymisation could be either Full (or True) Anonymisation or Psuedoanonymisation, in which case Pseudoanonymisation would be a subclass of Anonymisation, but that may introduce confusion between Anonymisation and Full Anonymisation and therefore be undesirable. So having Anonymisation and Pseudoanonymisation as parallels may be the best solution.

A possible name for a superclass for both types of anonymisation as well as encryption might be Data Obfuscation.

coolharsh55 · 2022-01-10T11:43:43Z

Hi Maya, thanks for the input, I agree with your arguments. I tried looking up EDPB and ISO definitions for these terms and how they are used, and it is similar to what you propose. But other uses (e.g. industry, technical) considers 'anonymisation' as a broad range of techniques which also includes pseudo-anonmisation.

Then there is further confusion as to what data is produced as an outcome of these processes. An anonmisation process may still produce personal data (non-anonymous) if its associated with an identifer. For example, consider the case where an identifier is associated with a exact location. The anonymisation technique replaces this with country. Now the data is anonymised through anonymisation process but is still personal data. So there is a distinction between anonimisation as a technical term and that as applied for GDPR.

To support your proposal, maybe we can have Anonymisation as the general class of anonymisation-related techniques, and specifically PseudoAnonymisation and CompleteAnonymisation as subclasses. Data Obfuscation involves other techniques in addition to anonymisation, so it can be the parent class of Anonymisation once those other concepts have been identified.

coolharsh55 · 2022-06-24T17:15:51Z

Recording conversation at PEPR'22 about Anonymisation, where Damien pointed out this problem. The potential operation is changing "Anonymisation" to "AnonymisationMeasure" and "CompleteAnonymisation" to "Anonymisation" so as to bring these concepts in line with what is defined legally and in standards (e.g. ISO 29100) while keeping the 'taxonomy' of anonymisation approaches in tech/org measures.

TedTed · 2022-06-28T08:03:28Z

Thanks Harshvardhan! To add a bit more explanation to this, I see a fairly serious risk with calling "Anonymization" the concept that corresponds to "The class of measures/processes that are used in order to make data less identifiable": we end up in a situation where people might use "Anonymization" on their data, and end up with data that is not "anonymized" according to ISO standards & EU regulation. This confusion happens frequently in the media, due to the use of the work "anonymization" to mean "de-identification" in the US. I've seen this create problems in my previous role in a big tech company, which is partly why we decided to only call something "anonymization" if it reached that high bar of making it impossible to re-identify people.

I strongly support changing "CompleteAnonymization" to simply "Anonymization", so that something is called "Anonymization" if and only if it leads to anonymized data, and the confusion disappears. Changing "Anonymization" to "AnonymizationMeasure" helps people understand that this might not be enough, so this definitely seems much better to me. It might not be enough, though. An alternative would be to call this "DeidentificationMeasure", and rename the process of removing identifiers something like "IdentifierRedaction" to avoid confusion. Yet another alternative, clearer but verbose, would be something lie "ReidentificationRiskMitigation", to better capture this idea of "measure towards making it harder to identify people".

coolharsh55 · 2022-06-28T08:08:18Z

Thanks @TedTed ; I have updated the title on this issue to (re-)evaluate all names in tech/org measures with this perspective, and make changes where necessary.

coolharsh55 · 2022-10-01T15:40:12Z

Hi All, thanks for the feedback. The structure is now as follows:

DataAnonymisationTechnique
- Anonymisation
- Pseudonymisation
- Deidentification

derhagen · 2022-11-17T09:14:34Z

I fail to see the added value of introducing Deidentification over DataAnonymisationTechnique, which are defined as

DataAnonymisationTechnique: Use of anonymisation techniques that reduce the identifiability in data
Deidentification: Removal of identity or information to reduce identifiability

By definition, any measure that reduces identifiability of data needs to "remove information", in some sense. Therefore, Deidentification does not narrow down the space of techniques, and should either be further specified or ommitted. Was Deidentification included with a reference to HIPAA? Even in that case, we should consider to replace Deidentification with the "Expert Determination" and "Safe Harbor" methods as mentioned here: https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html

Otherwise, even though you renamed Anonymization to DataAnonymizationTechnique, I discovered this issue because I thought "Wait, Pseudonymization is not an Anonymization technique!". What about something along the lines of DataObfuscationTechnique?

derhagen · 2022-11-17T09:23:12Z

This discussion should probably be held in parallel with NonPersonalData and its subclasses, where some tidying up might be necessary. The Note of AnonymisedData refers to AnonymisedDataWithinScope, which does not seem to exist yet (ContextuallyAnonymisedData is a proposed term), and according to the ENISA source, SyntheticData "can be personal data, which are manipulated in a way to limit the potentials for individuals’ re-identification", which is not entirely aligned with DPV's definition.

derhagen · 2022-11-17T09:33:35Z

The GDPR (Recital 26) approach to anonymity is based on a rather risk-based "reasonable likeliness", based on

the costs of and
the amount of time required for identification, taking into consideration
the available technology
at the time of the processing and
[future] technological developments

Hence, these factors should be represented more precisely in the respective Class descriptions. As all of this is an active area of research and (in my opinion) not conclusively addressed by courts, it might might make sense to mark these Classes as unstable or proposed, if that is possible?

coolharsh55 · 2022-11-17T10:54:54Z

Hi.

I fail to see the added value of introducing Deidentification over DataAnonymisationTechnique, which are defined as...

Deidentification is a specific category of anonymisation techniques that focus on reducing identifiability. Anonymisation is broader than identifier removals because it also relates to potential re-combinations with other datasets to create identifiability.

Was Deidentification included with a reference to HIPAA? Even in that case, we should consider to replace Deidentification with the "Expert Determination" and "Safe Harbor" methods as mentioned here: https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html

Deidentification is a common term in this domain. E.g. there's even an ISO standard (20889:2018) about it https://www.iso.org/standard/69373.html. For HIPAA - the title explicitly states de-identification which is a strong argument to represent that concept. Further types of de-identification processes should be modelled as subclasses/sub-types of Deidentification, and not replace it. I would prefer the ISO terminology over HIPAA in this case as it is broader in scope and represents greater technical consensus in this case, with HIPAA concepts added later within the resulting hierarchy (if needed). Pseudonymisation is a declared as a DataAnonymisationTechnique (and not as a type of Anonymisation) for the sake of grouping anonymisation related concepts together under an umbrella term.

The Note of AnonymisedData refers to AnonymisedDataWithinScope, which does not seem to exist yet (ContextuallyAnonymisedData is a proposed term), and according to the ENISA source, SyntheticData "can be personal data, which are manipulated in a way to limit the potentials for individuals’ re-identification", which is not entirely aligned with DPV's definition.

AnonymisedDataWithinScope has been changed to ContextuallyAnonymisedData, the note has been updated. Where SnythethicData is also a personal data, the data should be declared also as a subclass/type of PersonalData. The note states it can be personal or non-personal. The description is taken from ENISA guide on Data Protection Engineering,https://www.enisa.europa.eu/publications/data-protection-engineering

The GDPR (Recital 26) approach to anonymity is based on a rather risk-based "reasonable likeliness", based on
* the costs of and
* the amount of time required for identification, taking into consideration
* the available technology
* at the time of the processing and
* [future] technological developments

Hence, these factors should be represented more precisely in the respective Class descriptions. As all of this is an active area of research and (in my opinion) not conclusively addressed by courts, it might might make sense to mark these Classes as unstable or proposed, if that is possible?

I see the value in representing this as a concept, but an unsure as to how it should be associated with processing information. My guess is to provide as an organisational measure, similar to policies and assessments. So IdentifiabilityAssessment as an OrganisationalMeasure with the stated recital-26 concepts as descriptions. I do not think we should represent each of those factors individually as concepts and properties for only the scope of identifiability. Costs, time for technical processes, technology availability (e.g. TRL in SotA), and future predictions are far too broad and relevant for a lot of other concepts - so should be modelled with a greater scope (and careful consideration). I can add these as proposed concepts if you or someone else is willing to take on the task of investigating these.

coolharsh55 · 2022-11-23T13:35:32Z

We discussed in today's meeting and are okay with the current list. We're keeping this open in case there are further discussions. Other we will close this in the coming weeks as completed.

TedTed · 2022-11-23T13:38:04Z

For context, does the "current list" refer to this comment or to the state of the world prior to this issue?

coolharsh55 · 2022-11-23T15:02:47Z

Current list as in the concepts that are in DPV as of now, after the comments.

derhagen · 2022-11-24T09:57:00Z

Sorry for the late response, but I continue to raise the argument that Pseudonymization is not an anonymisation technique.

Thank you for your clarification of Deidentification, I think the fact that it refers to a term from an ISO standard should be mentioned in the Class description. Strictly following the Class descriptions as they are right now, Deidentification and DataAnonymisationTechnique describe the equivalend things, without the additional knowlege of the mentioned ISO standard.

With respect to the Recital 26 criteria for anonymised data, I didn't propose to add these as organizational measures - even though that's a good idea - but simply to add a reference to Recital 26 and the mentioned criteria to the Class description or note, as they define what anonymised data is in the first place.

coolharsh55 · 2022-11-24T10:43:36Z

Hi. Thanks for your comment, I understand your point, and the need to change this.

I continue to raise the argument that Pseudonymization is not an anonymisation technique.

Yes, strictly speaking this is correct, though the concept DataAnonymisationTechnique was intended to group related concepts together as noted by Irish Data Protection Commission in their Guidance on Anonymisation and Pseudonymisation pg.12. Still, as you state, it would be better to avoid this confusion. So based on tha rationale laid out in NIST NISTIR 8053 De-Identification of Personal Information, these concepts are organised as follow:

DeIdentification as the top concept
- Anonymisation
  - ~~CompleteAnonymisation~~ (edited to remove concept)
- Pseudoanonymisation

to add a reference to Recital 26 and the mentioned criteria to the Class description or note, as they define what anonymised data is in the first place

Instead of GDPR's recitals, the techniques have been linked to ISO 29100:2011 Security Techniques -- Privacy Framework definitions which are more broadly used.

@JonathanBowker

The following typos in IRIs were fixed using the new SHACL shapes from previous commit: - dpv:expiry relation instead of dpv:hasExpiry relation in consent - dpv:hasConsequenceOn was used as a parent even though it was proposed. The term has been promoted to accepted status - Typos in Technical measures where Crypto- was mistyped as Cryto- Errors in labels: - MaintainCreditCheckingDatabase - MaintainCreditRatingDatabase The following terms were updated: - GDPR's legal bases where text has been added from Art.6 and the parent terms have been aligned with main spec's legal bases (including creation of new terms to match granularity) - Anonymisation and Pseudonymisation have been changed to be types of Deidentification techniques (as the grouping parent concept) to distinguish them following discussions in #15 - DPV-LEGAL has laws and DPAs for USA from contributions by @JonathanBowker

@JonathanBowker

The following typos in IRIs were fixed using the new SHACL shapes from previous commit: - dpv:expiry relation instead of dpv:hasExpiry relation in consent - dpv:hasConsequenceOn was used as a parent even though it was proposed. The term has been promoted to accepted status - Typos in Technical measures where Crypto- was mistyped as Cryto- Errors in labels: - MaintainCreditCheckingDatabase - MaintainCreditRatingDatabase The following terms were updated: - GDPR's legal bases where text has been added from Art.6 and the parent terms have been aligned with main spec's legal bases (including creation of new terms to match granularity) - Anonymisation and Pseudonymisation have been changed to be types of Deidentification techniques (as the grouping parent concept) to distinguish them following discussions in #15 - DPV-LEGAL has laws and DPAs for USA from contributions by @JonathanBowker

coolharsh55 · 2024-04-20T16:58:08Z

Reviewed and closed based on implementation in https://w3id.org/dpv#vocab-TOM-technical which contains the described structure.

coolharsh55 added the help wanted label Jun 24, 2021

coolharsh55 added the todo label Feb 22, 2022

coolharsh55 added the help-wanted label Jun 9, 2022

coolharsh55 changed the title ~~Categorisation of Pseudo-Anonymisation and Encryption is not correct~~ Re-Evaluate Anonymisation and Security Measure names for Correctness Jun 28, 2022

coolharsh55 added this to the DPV v1 milestone Jun 30, 2022

coolharsh55 added review and removed todo labels Oct 1, 2022

coolharsh55 mentioned this issue Oct 22, 2022

DPV v1.0 release candiate - feedback, discussions, and actions #50

Closed

35 tasks

coolharsh55 modified the milestones: DPV v1, DPV v1.1 May 10, 2023

coolharsh55 modified the milestones: DPV v1.1, dpv v2 Apr 13, 2024

coolharsh55 closed this as completed Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-Evaluate Anonymisation and Security Measure names for Correctness #15

Re-Evaluate Anonymisation and Security Measure names for Correctness #15

coolharsh55 commented Jun 24, 2021

mayaborges commented Jan 10, 2022

coolharsh55 commented Jan 10, 2022

coolharsh55 commented Jun 24, 2022

TedTed commented Jun 28, 2022

coolharsh55 commented Jun 28, 2022

coolharsh55 commented Oct 1, 2022

derhagen commented Nov 17, 2022

derhagen commented Nov 17, 2022

derhagen commented Nov 17, 2022

coolharsh55 commented Nov 17, 2022

coolharsh55 commented Nov 23, 2022

TedTed commented Nov 23, 2022

coolharsh55 commented Nov 23, 2022

derhagen commented Nov 24, 2022

coolharsh55 commented Nov 24, 2022 •

edited

Loading

coolharsh55 commented Apr 20, 2024

Re-Evaluate Anonymisation and Security Measure names for Correctness #15

Re-Evaluate Anonymisation and Security Measure names for Correctness #15

Comments

coolharsh55 commented Jun 24, 2021

mayaborges commented Jan 10, 2022

coolharsh55 commented Jan 10, 2022

coolharsh55 commented Jun 24, 2022

TedTed commented Jun 28, 2022

coolharsh55 commented Jun 28, 2022

coolharsh55 commented Oct 1, 2022

derhagen commented Nov 17, 2022

derhagen commented Nov 17, 2022

derhagen commented Nov 17, 2022

coolharsh55 commented Nov 17, 2022

coolharsh55 commented Nov 23, 2022

TedTed commented Nov 23, 2022

coolharsh55 commented Nov 23, 2022

derhagen commented Nov 24, 2022

coolharsh55 commented Nov 24, 2022 • edited Loading

coolharsh55 commented Apr 20, 2024

coolharsh55 commented Nov 24, 2022 •

edited

Loading