anonymization

python3 names.py dummy-sample.tab

This script anonymizes the comment fields ('omschrijving') in Dutch bank transactions, by removing all person names. It takes as input file a tab-separated text file with the following columns:

MINISTERIE, BOEKJAAR, NAAM LEVERANCIER, OMSCHRIJVING, BEDRAG, VALUTA, GB_DATUM, EUR_BEDRAG

It uses a number of external resources:

a list of 10,000 Dutch surnames and prefixes (‘de’, ‘ter’, ‘van’ etc.). Downloaded from naamkunde.net
a list of 9,755 Dutch first names. Downloaded from naamkunde.net
a list of 381,292 Dutch words. The file DFW.CD from the CELEX database
a list of abbreviations, extracted from the transaction data itself: all words of 2–4 words that consist of only capital letters and are not a prefix or salutation (‘DHR’, ‘MEVR’, etc.), and occur at least 50 and that times in the data.

The generated output is a tab-separated file with the following 3 columns:

item id of original row, anonymized omschrijving (names replaced by ***), list of found names

Evaluation showed that the coverage (recall) of the method is good, with around 95% of the names removed. However, the price for achieving this high recall is that precision was reduced to around 50%. This implies that if 10 names are removed, 10 non-names are also removed from the data.

License

See the LICENSE file for license rights and limitations (GNU-GPL v3.0).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
DFW.CD		DFW.CD
LICENSE.md		LICENSE.md
README.md		README.md
abbrev_freq.txt		abbrev_freq.txt
dummy-sample.tab		dummy-sample.tab
familienamen_10kw.xml		familienamen_10kw.xml
names.py		names.py
voornamen_10kw.txt		voornamen_10kw.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

anonymization

License

About

Releases

Packages

Languages

License

suzanv/anonymization

Folders and files

Latest commit

History

Repository files navigation

anonymization

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages