Skip to content

Commit

Permalink
Merge pull request #106 from DataResponsibly/development
Browse files Browse the repository at this point in the history
Development
  • Loading branch information
denysgerasymuk799 committed Jan 29, 2024
2 parents 2503a94 + 5190574 commit 14c275a
Show file tree
Hide file tree
Showing 267 changed files with 43,467 additions and 7,536 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,5 @@ jobs:
- name: pytest [Branch]
run: |
source ~/.venv/bin/activate
pip install requests-toolbelt==1.0.0
pip install xgboost~=1.7.2
pytest --durations=10 -n logical # Run pytest on all logical CPU cores
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
*_venv
virny_env
notebooks
*.env
.DS_Store
Expand Down
45 changes: 24 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,28 +28,29 @@
</p>



## 📜 Description

**Virny** is a Python library for auditing model stability and fairness. The Virny library was
developed based on three fundamental principles:
**Virny** is a Python library for in-depth profiling of model performance across overall and disparity dimensions.
In addition to its metric computation capabilities, the library provides an interactive tool called _VirnyView_
to streamline responsible model selection and generate nutritional labels for ML models.
The Virny library was developed based on three fundamental principles:

1) easy extensibility of model analysis capabilities;

2) compatibility to user-defined/custom datasets and model types;

3) simple composition of parity metrics based on context of use.
3) simple composition of disparity metrics based on the context of use.

Virny decouples model auditing into several stages, including: **subgroup metrics computation**, **group metrics composition**,
and **metrics visualization and reporting**. This gives data scientists and practitioners more control and flexibility
to use the library for model development and monitoring post-deployment.
Virny decouples model auditing into several stages, including: **subgroup metric computation**, **disparity metric composition**,
and **metric visualization**. This gives data scientists more control and flexibility to use the library
for model development and monitoring post-deployment.

For quickstart, look at our [Use Case Examples](https://dataresponsibly.github.io/Virny/examples/Multiple_Models_Interface_Use_Case/).
For quickstart, look at [use case examples](https://dataresponsibly.github.io/Virny/examples/Multiple_Models_Interface_Use_Case/), [an interactive demo](https://huggingface.co/spaces/denys-herasymuk/virny-demo), and [a demonstrative Jupyter notebook](https://huggingface.co/spaces/denys-herasymuk/virny-demo/blob/main/notebooks/ACS_Income_Demo.ipynb).


## 🛠 Installation

Virny supports **Python 3.8 (recommended), 3.9** and can be installed with `pip`:
Virny supports **Python 3.8 and 3.9** and can be installed with `pip`:

```bash
pip install virny
Expand All @@ -61,29 +62,31 @@ pip install virny
* [Introduction](https://dataresponsibly.github.io/Virny/)
* [API Reference](https://dataresponsibly.github.io/Virny/api/overview/)
* [Use Case Examples](https://dataresponsibly.github.io/Virny/examples/Multiple_Models_Interface_Use_Case/)
* [Interactive Demo](https://huggingface.co/spaces/denys-herasymuk/virny-demo)


## 💡 Features

* Entire pipeline for auditing model stability and fairness
* Metrics reports and visualizations
* Ability to analyze intersections of sensitive attributes
* Entire pipeline for profiling model accuracy, stability, uncertainty, and fairness
* Ability to analyze non-binary sensitive attributes and their intersections
* Compatibility with [pre-, in-, and post-processors](https://aif360.readthedocs.io/en/latest/modules/algorithms.html#) for fairness enhancement from AIF360
* Convenient metric computation interfaces: an interface for multiple models, an interface for multiple test sets, and an interface for saving results into a user-defined database
* An `error_analysis` computation mode to analyze model stability and confidence for correct and incorrect prodictions splitted by groups
* Data loaders with subsampling for fairness datasets
* An `error_analysis` computation mode to analyze model stability and confidence for correct and incorrect prodictions broken down by groups
* Metric static and interactive visualizations
* Data loaders with subsampling for popular fair-ML benchmark datasets
* User-friendly parameters input via config yaml files
* Check out [our documentation](https://dataresponsibly.github.io/Virny/) for a comprehensive overview


## 📖 Library Terminology
## 📖 Library Overview

This section briefly explains the main terminology used in our library.
![Virny_Architecture](https://github.com/DataResponsibly/Virny/assets/42843889/91620e0f-11ff-4093-8fb6-c88c90bff711)

* A **sensitive attribute** is an attribute that partitions the population into groups with unequal benefits received.
* A **protected group** (or simply _group_) is created by partitioning the population by one or many sensitive attributes.
* A **privileged value** of a sensitive attribute is a value that gives more benefit to a protected group, which includes it, than to protected groups, which do not include it.
* A **subgroup** is created by splitting a protected group by privileges and disprivileged values.
* A **group metric** is a metric that shows the relation between privileged and disprivileged subgroups created based on one or many sensitive attributes.
The software framework decouples the process of model profiling into several stages, including **subgroup metric computation**,
**disparity metric composition**, and **metric visualization**. This separation empowers data scientists with greater control and
flexibility in employing the library, both during model development and for post-deployment monitoring. The above figure demonstrates
how the library constructs a pipeline for model analysis. Inputs to a user interface are shown in green, pipeline stages are shown in blue,
and the output of each stage is shown in purple.


## 🤗 Affiliations
Expand Down
15 changes: 9 additions & 6 deletions docs/api/analyzers/AbstractOverallVarianceAnalyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,14 @@ Abstract class for an analyzer that computes overall variance metrics for subgro

Number of estimators in ensemble to measure base_model stability

- **with_predict_proba** (*bool*) – defaults to `True`

[Optional] A flag if model can return probabilities for its predictions. If no, only metrics based on labels (not labels and probabilities) will be computed.

- **notebook_logs_stdout** (*bool*) – defaults to `False`

[Optional] True, if this interface was execute in a Jupyter notebook, False, otherwise.

- **verbose** (*int*) – defaults to `0`

[Optional] Level of logs printing. The greater level provides more logs. As for now, 0, 1, 2 levels are supported.
Expand All @@ -65,17 +73,12 @@ Abstract class for an analyzer that computes overall variance metrics for subgro

???- note "compute_metrics"

Measure metrics for the base model. Display plots for analysis if needed. Save results to a .pkl file
Measure metrics for the base model. Save results to a .csv file.

**Parameters**

- **make_plots** (*bool*) – defaults to `False`
- **save_results** (*bool*) – defaults to `True`
- **with_fit** (*bool*) – defaults to `True`

???- note "get_metrics_dict"

???- note "print_metrics"

???- note "save_metrics_to_file"

1 change: 1 addition & 0 deletions docs/api/analyzers/AbstractSubgroupAnalyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Abstract class for a subgroup analyzer to compute metrics for subgroups.
**Parameters**

- **y_preds**
- **models_predictions** (*dict*)
- **save_results** (*bool*)
- **result_filename** (*str*) – defaults to `None`
- **save_dir_path** (*str*) – defaults to `None`
Expand Down
17 changes: 10 additions & 7 deletions docs/api/analyzers/BatchOverallVarianceAnalyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Analyzer to compute subgroup variance metrics for batch learning models.

- **base_model_name** (*str*)

Model name like 'HoeffdingTreeClassifier' or 'LogisticRegression'
Model name like 'DecisionTreeClassifier' or 'LogisticRegression'

- **bootstrap_fraction** (*float*)

Expand Down Expand Up @@ -46,6 +46,14 @@ Analyzer to compute subgroup variance metrics for batch learning models.

Number of estimators in ensemble to measure base_model stability

- **with_predict_proba** (*bool*) – defaults to `True`

[Optional] A flag if model can return probabilities for its predictions. If no, only metrics based on labels (not labels and probabilities) will be computed.

- **notebook_logs_stdout** (*bool*) – defaults to `False`

[Optional] True, if this interface was execute in a Jupyter notebook, False, otherwise.

- **verbose** (*int*) – defaults to `0`

[Optional] Level of logs printing. The greater level provides more logs. As for now, 0, 1, 2 levels are supported.
Expand All @@ -69,17 +77,12 @@ Analyzer to compute subgroup variance metrics for batch learning models.

???- note "compute_metrics"

Measure metrics for the base model. Display plots for analysis if needed. Save results to a .pkl file
Measure metrics for the base model. Save results to a .csv file.

**Parameters**

- **make_plots** (*bool*) – defaults to `False`
- **save_results** (*bool*) – defaults to `True`
- **with_fit** (*bool*) – defaults to `True`

???- note "get_metrics_dict"

???- note "print_metrics"

???- note "save_metrics_to_file"

96 changes: 96 additions & 0 deletions docs/api/analyzers/BatchOverallVarianceAnalyzerPostProcessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# BatchOverallVarianceAnalyzerPostProcessing

Analyzer to compute subgroup variance metrics using the defined post-processor.



## Parameters

- **postprocessor**

One of postprocessors from aif360 (https://aif360.readthedocs.io/en/stable/modules/algorithms.html#module-aif360.algorithms.postprocessing)

- **sensitive_attribute** (*str*)

A sensitive attribute to use for post-processing

- **base_model**

Base model for stability measuring

- **base_model_name** (*str*)

Model name like 'DecisionTreeClassifier' or 'LogisticRegression'

- **bootstrap_fraction** (*float*)

[0-1], fraction from train_pd_dataset for fitting an ensemble of base models

- **X_train** (*pandas.core.frame.DataFrame*)

Processed features train set

- **y_train** (*pandas.core.frame.DataFrame*)

Targets train set

- **X_test** (*pandas.core.frame.DataFrame*)

Processed features test set

- **y_test** (*pandas.core.frame.DataFrame*)

Targets test set

- **target_column** (*str*)

Name of the target column

- **dataset_name** (*str*)

Name of dataset, used for correct results naming

- **n_estimators** (*int*)

Number of estimators in ensemble to measure base_model stability

- **with_predict_proba** (*bool*) – defaults to `True`

[Optional] A flag if model can return probabilities for its predictions. If no, only metrics based on labels (not labels and probabilities) will be computed.

- **notebook_logs_stdout** (*bool*) – defaults to `False`

[Optional] True, if this interface was execute in a Jupyter notebook, False, otherwise.

- **verbose** (*int*) – defaults to `0`

[Optional] Level of logs printing. The greater level provides more logs. As for now, 0, 1, 2 levels are supported.




## Methods

???- note "UQ_by_boostrap"

Quantifying uncertainty of the base model by constructing an ensemble from bootstrapped samples and applying postprocessing intervention.

Return a dictionary where keys are models indexes, and values are lists of correspondent model predictions for X_test set.

**Parameters**

- **boostrap_size** (*int*)
- **with_replacement** (*bool*)
- **with_fit** (*bool*) – defaults to `True`

???- note "compute_metrics"

Measure metrics for the base model. Save results to a .csv file.

**Parameters**

- **save_results** (*bool*) – defaults to `True`
- **with_fit** (*bool*) – defaults to `True`

???- note "save_metrics_to_file"

1 change: 1 addition & 0 deletions docs/api/analyzers/SubgroupErrorAnalyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Analyzer to compute error metrics for subgroups.
**Parameters**

- **y_preds**
- **models_predictions** (*dict*)
- **save_results** (*bool*)
- **result_filename** (*str*) – defaults to `None`
- **save_dir_path** (*str*) – defaults to `None`
Expand Down
15 changes: 13 additions & 2 deletions docs/api/analyzers/SubgroupVarianceAnalyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Analyzer to compute variance metrics for subgroups.

## Parameters

- **model_setting** (*virny.configs.constants.ModelSetting*)
- **model_setting** (*[metrics.ModelSetting](../../metrics/ModelSetting)*)

Model learning type; a constant from virny.configs.constants.ModelSetting

Expand Down Expand Up @@ -42,10 +42,22 @@ Analyzer to compute variance metrics for subgroups.

A dictionary of protected groups where keys are subgroup names, and values are X_test row indexes correspondent to this subgroup.

- **postprocessor** – defaults to `None`

One of postprocessors from aif360 (https://aif360.readthedocs.io/en/stable/modules/algorithms.html#module-aif360.algorithms.postprocessing)

- **postprocessing_sensitive_attribute** (*str*) – defaults to `None`

A sensitive attribute to use for post-processing

- **computation_mode** (*str*) – defaults to `None`

[Optional] A non-default mode for metrics computation. Should be included in the ComputationMode enum.

- **notebook_logs_stdout** (*bool*) – defaults to `False`

[Optional] True, if this interface was execute in a Jupyter notebook, False, otherwise.

- **verbose** (*int*) – defaults to `0`

[Optional] Level of logs printing. The greater level provides more logs. As for now, 0, 1, 2 levels are supported.
Expand All @@ -66,7 +78,6 @@ Analyzer to compute variance metrics for subgroups.
- **save_results** (*bool*)
- **result_filename** (*str*) – defaults to `None`
- **save_dir_path** (*str*) – defaults to `None`
- **make_plots** (*bool*) – defaults to `True`
- **with_fit** (*bool*) – defaults to `True`

???- note "set_test_protected_groups"
Expand Down
5 changes: 5 additions & 0 deletions docs/api/analyzers/SubgroupVarianceCalculator.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ Calculator that calculates variance metrics for subgroups.

[Optional] A non-default mode for metrics computation. Should be included in the ComputationMode enum.

- **with_predict_proba** (*bool*) – defaults to `True`

[Optional] A flag if model can return probabilities for its predictions. If no, only metrics based on labels (not labels and probabilities) will be computed.




Expand All @@ -39,6 +43,7 @@ Calculator that calculates variance metrics for subgroups.

**Parameters**

- **y_preds**
- **models_predictions** (*dict*)
- **save_results** (*bool*)
- **result_filename** (*str*) – defaults to `None`
Expand Down
4 changes: 2 additions & 2 deletions docs/api/custom-classes/MetricsComposer.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# MetricsComposer

Composer class that combines different subgroup metrics to create group metrics such as 'Disparate_Impact' or 'Accuracy_Parity'

Metric Composer class that combines different subgroup metrics to create disparity metrics such as 'Disparate_Impact' or 'Accuracy_Difference'.

Definitions of the disparity metrics could be observed in the __init__ method of the Metric Composer: https://github.com/DataResponsibly/Virny/blob/main/virny/custom_classes/metrics_composer.py

## Parameters

Expand Down
37 changes: 37 additions & 0 deletions docs/api/custom-classes/MetricsInteractiveVisualizer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# MetricsInteractiveVisualizer

Class to create an interactive web app based on models metrics.



## Parameters

- **X_data** (*pandas.core.frame.DataFrame*)

An original features dataframe

- **y_data** (*pandas.core.frame.DataFrame*)

An original target column pandas series

- **model_metrics**

A dictionary or a dataframe where keys are model names and values are dataframes of subgroup metrics for each model

- **sensitive_attributes_dct** (*dict*)

A dictionary where keys are sensitive attributes names (including attributes intersections), and values are privilege values for these attributes




## Methods

???- note "create_web_app"

Build an interactive web application.

**Parameters**

- **start_app** – defaults to `True`

Loading

0 comments on commit 14c275a

Please sign in to comment.