Merge branch 'main' into lira_carlini

AI-SDC · Jun 5, 2024 · da2e41b · da2e41b
2 parents 5b4ba84 + 3108c5f
commit da2e41b
Show file tree

Hide file tree

Showing 53 changed files with 286 additions and 3,137 deletions.
diff --git a/.pylintrc b/.pylintrc
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,11 @@
 # Changelog
 
+Changes:
+*   Add support for scikit-learn MLPClassifier ([#276](https://github.com/AI-SDC/AI-SDC/pull/276))
+*   Use default XGBoost params if not defined in structural attacks ([#277](https://github.com/AI-SDC/AI-SDC/pull/277))
+*   Clean up documentation ([#282](https://github.com/AI-SDC/AI-SDC/pull/282))
+*   Clean up repository and update packaging ([#283](https://github.com/AI-SDC/AI-SDC/pull/283))
+
 ## Version 1.1.3 (Apr 26, 2024)
 
 Changes:

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,4 +1,35 @@
-# General guidance for contributors
+# General Guidance for Contributors
+
+## Development
+
+Clone the repository and install the local package including all dependencies within a virtual environment:
+
+```
+$ git clone https://github.com/AI-SDC/AI-SDC.git
+$ cd AI-SDC
+$ pip install .[test]
+```
+
+Then to run the tests:
+
+```
+$ pytest .
+```
+
+## Directory Structure
+
+* `aisdc` Contains the aisdc source code.
+    - `attacks` Contains a variety of privacy attacks on machine learning models.
+    - `preprocessing` Contains preprocessing modules for test datasets.
+    - `safemodel` The safemodel wrappers for common machine learning models.
+* `docs` Contains Sphinx documentation files.
+* `examples` Contains examples of how to run the code contained in this repository.
+* `tests` Contains unit tests.
+* `user_stories` Contains user guides.
+
+## Documentation
+
+Documentation is hosted here: https://ai-sdc.github.io/AI-SDC/
 
 ## Style Guide
 
@@ -26,8 +57,6 @@ To install as a hook that executes with every `git commit`:
 $ pre-commit install
 ```
 
-*******************************************************************************
-
 ## Automatic Documentation
 
 The documentation is automatically built using [Sphinx](https://www.sphinx-doc.org) and github actions.

diff --git a/README.md b/README.md
@@ -6,79 +6,42 @@
 
 # AI-SDC
 
-A collection of tools and resources for managing the statistical disclosure control of trained machine learning models. For a brief introduction, see [Smith et al. (2022)](https://doi.org/10.48550/arXiv.2212.01233).
+A collection of tools and resources for managing the [statistical disclosure control](https://en.wikipedia.org/wiki/Statistical_disclosure_control) of trained [machine learning](https://en.wikipedia.org/wiki/Machine_learning) models. For a brief introduction, see [Smith et al. (2022)](https://doi.org/10.48550/arXiv.2212.01233).
 
-### User Guides
+The `aisdc` package provides:
+* A variety of privacy attacks for assessing machine learning models.
+* The safemodel package: a suite of open source wrappers for common machine learning frameworks, including [scikit-learn](https://scikit-learn.org) and [Keras](https://keras.io). It is designed for use by researchers in Trusted Research Environments (TREs) where disclosure control methods must be implemented. Safemodel aims to give researchers greater confidence that their models are more compliant with disclosure control.
 
-A collection of user guides can be found in the 'user_stories' folder of this repository. These guides include configurable examples from the perspective of both a researcher and a TRE, with separate scripts for each. Instructions on how to use each of these scripts and which scripts to use are included in the README of the [`user_stories`](./user_stories) folder.
+A collection of user guides can be found in the [`user_stories`](user_stories) folder of this repository. These guides include configurable examples from the perspective of both a researcher and a TRE, with separate scripts for each. Instructions on how to use each of these scripts and which scripts to use are included in the README located in the folder.
 
-## Content
-
-* `aisdc`
-    - `attacks` Contains a variety of privacy attacks on machine learning models, including membership and attribute inference.
-    - `preprocessing` Contains preprocessing modules for test datasets.
-    - `safemodel` The safemodel package is an open source wrapper for common machine learning models. It is designed for use by researchers in Trusted Research Environments (TREs) where disclosure control methods must be implemented. Safemodel aims to give researchers greater confidence that their models are more compliant with disclosure control.
-* `docs` Contains Sphinx documentation files.
-* `example_notebooks` Contains short tutorials on the basic concept of "safe_XX" versions of machine learning algorithms, and examples of some specific algorithms.
-* `examples` Contains examples of how to run the code contained in this repository:
-  - How to simulate attribute inference attacks `attribute_inference_example.py`.
-  - How to simulate membership inference attacks:
-    + Worst case scenario attack `worst_case_attack_example.py`.
-    + LIRA scenario attack `lira_attack_example.py`.
-  - Integration of attacks into safemodel classes `safemodel_attack_integration_bothcalls.py`.
-* `risk_examples` Contains hypothetical examples of data leakage through machine learning models as described in the [Green Paper](https://doi.org/10.5281/zenodo.6896214).
-* `tests` Contains unit tests.
-
-## Documentation
-
-Documentation is hosted here: https://ai-sdc.github.io/AI-SDC/
-
-## Installation / End-user
+## Installation
 
 [![PyPI package](https://img.shields.io/pypi/v/aisdc.svg)](https://pypi.org/project/aisdc)
 
-Install `aisdc` (safest in a virtual env) and manually copy the [`examples`](examples/) and [`example_notebooks`](example_notebooks/).
+Install `aisdc` and manually copy the [`examples`](examples/).
 
 To install only the base package, which includes the attacks used for assessing privacy:
 
 ```
 $ pip install aisdc
 ```
 
-To install the base package and the safemodel package, which includes defensive wrappers for popular ML frameworks including [scikit-learn](https://scikit-learn.org) and [Keras](https://keras.io):
+To additionally install the safemodel package:
 
 ```
 $ pip install aisdc[safemodel]
 ```
 
 ## Running
 
-To run an example, simply execute the desired script or start up `jupyter notebook` and run one of the notebooks.
-
-For example, to run the `lira_attack_example.py`:
+To run an example, simply execute the desired script or start up `jupyter notebook` and run one of the notebooks. For example, to run the LiRA example:
 
 ```
 $ python -m lira_attack_example
 ```
 
-## Development
-
-Clone the repository and install the local package including all dependencies (safest in a virtual env):
-
-```
-$ git clone https://github.com/AI-SDC/AI-SDC.git
-$ cd AI-SDC
-$ pip install .[test]
-```
-
-Then run the tests:
-
-```
-$ pytest .
-```
-
----
+## Acknowledgement
 
-This work was funded by UK Research and Innovation under Grant Numbers MC_PC_21033  and MC_PC_23006 as part of Phase 1 of the DARE UK (Data and Analytics Research Environments UK) programme (https://dareuk.org.uk/), delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK). The specific projects were Semi-Automatic checking of Research Outputs (SACRO -MC_PC_23006) and   Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMATTER - MC_PC_21033). This project has also been supported by MRC and EPSRC [grant number MR/S010351/1]: PICTURES.
+This work was funded by UK Research and Innovation under Grant Numbers MC_PC_21033 and MC_PC_23006 as part of Phase 1 of the [DARE UK](https://dareuk.org.uk) (Data and Analytics Research Environments UK) programme, delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK). The specific projects were Semi-Automatic checking of Research Outputs (SACRO; MC_PC_23006) and Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMATTER; MC_PC_21033).This project has also been supported by MRC and EPSRC [grant number MR/S010351/1]: PICTURES.
 
 <img src="docs/source/images/UK_Research_and_Innovation_logo.svg" width="20%" height="20%" padding=20/> <img src="docs/source/images/health-data-research-uk-hdr-uk-logo-vector.png" width="10%" height="10%" padding=20/> <img src="docs/source/images/logo_print.png" width="15%" height="15%" padding=20/>
diff --git a/aisdc/attacks/structural_attack.py b/aisdc/attacks/structural_attack.py
@@ -1,9 +1,7 @@
 """
-Structural_attack.py.
-
-Runs a number of 'static' structural attacks,based on:
-- the target model's properties
-- the TREs risk appetite as applied to tables and standard regressions
+Runs a number of 'static' structural attacks based on:
+(i) the target model's properties
+(ii) the TREs risk appetite as applied to tables and standard regressions.
 """
 
 from __future__ import annotations

diff --git a/aisdc/attacks/target.py b/aisdc/attacks/target.py
@@ -1,4 +1,4 @@
-"""Target.py - information storage about the target model and data."""
+"""Stores information about the target model and data."""
 
 from __future__ import annotations
 

diff --git a/aisdc/attacks/worst_case_attack.py b/aisdc/attacks/worst_case_attack.py
@@ -1,8 +1,6 @@
-"""
-Worst_case_attack.py.
+"""Runs a worst case attack based upon predictive probabilities."""
 
-Runs a worst case attack based upon predictive probabilities stored in two .csv files
-"""  # pylint: disable = too-many-lines
+# pylint: disable = too-many-lines
 
 from __future__ import annotations
 

diff --git a/aisdc/preprocessing/loaders.py b/aisdc/preprocessing/loaders.py
@@ -1,7 +1,6 @@
 """
-Loaders.py
-A set of useful handlers to pull in datasets common to the project and perform the appropriate
-pre-processing.
+A set of useful handlers to pull in datasets common to the project and perform
+the appropriate pre-processing.
 """
 
 # pylint: disable=import-error, invalid-name, consider-using-with, too-many-return-statements