Merge pull request #106 from DataResponsibly/development

Development
DataResponsibly · Jan 29, 2024 · 14c275a · 14c275a
2 parents 2503a94 + 5190574
commit 14c275a
Show file tree

Hide file tree

Showing 267 changed files with 43,467 additions and 7,536 deletions.
diff --git a/.github/workflows/unit-tests.yml b/.github/workflows/unit-tests.yml
@@ -23,5 +23,5 @@ jobs:
       - name: pytest [Branch]
         run: |
           source ~/.venv/bin/activate
-          pip install requests-toolbelt==1.0.0
+          pip install xgboost~=1.7.2
           pytest --durations=10 -n logical # Run pytest on all logical CPU cores
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,5 @@
 *_venv
+virny_env
 notebooks
 *.env
 .DS_Store

diff --git a/README.md b/README.md
@@ -28,28 +28,29 @@
 </p>
 
 
-
 ## 📜 Description
 
-**Virny** is a Python library for auditing model stability and fairness. The Virny library was
-developed based on three fundamental principles: 
+**Virny** is a Python library for in-depth profiling of model performance across overall and disparity dimensions. 
+In addition to its metric computation capabilities, the library provides an interactive tool called _VirnyView_ 
+to streamline responsible model selection and generate nutritional labels for ML models. 
+The Virny library was developed based on three fundamental principles: 
 
 1) easy extensibility of model analysis capabilities;
 
 2) compatibility to user-defined/custom datasets and model types;
 
-3) simple composition of parity metrics based on context of use.
+3) simple composition of disparity metrics based on the context of use.
 
-Virny decouples model auditing into several stages, including: **subgroup metrics computation**, **group metrics composition**,
-and **metrics visualization and reporting**. This gives data scientists and practitioners more control and flexibility 
-to use the library for model development and monitoring post-deployment.
+Virny decouples model auditing into several stages, including: **subgroup metric computation**, **disparity metric composition**,
+and **metric visualization**. This gives data scientists more control and flexibility to use the library
+for model development and monitoring post-deployment.
 
-For quickstart, look at our [Use Case Examples](https://dataresponsibly.github.io/Virny/examples/Multiple_Models_Interface_Use_Case/).
+For quickstart, look at [use case examples](https://dataresponsibly.github.io/Virny/examples/Multiple_Models_Interface_Use_Case/), [an interactive demo](https://huggingface.co/spaces/denys-herasymuk/virny-demo), and [a demonstrative Jupyter notebook](https://huggingface.co/spaces/denys-herasymuk/virny-demo/blob/main/notebooks/ACS_Income_Demo.ipynb).
 
 
 ## 🛠 Installation
 
-Virny supports **Python 3.8 (recommended), 3.9** and can be installed with `pip`:
+Virny supports **Python 3.8 and 3.9** and can be installed with `pip`:
 
 ```bash
 pip install virny
@@ -61,29 +62,31 @@ pip install virny
 * [Introduction](https://dataresponsibly.github.io/Virny/)
 * [API Reference](https://dataresponsibly.github.io/Virny/api/overview/)
 * [Use Case Examples](https://dataresponsibly.github.io/Virny/examples/Multiple_Models_Interface_Use_Case/)
+* [Interactive Demo](https://huggingface.co/spaces/denys-herasymuk/virny-demo)
 
 
 ## 💡 Features
 
-* Entire pipeline for auditing model stability and fairness
-* Metrics reports and visualizations
-* Ability to analyze intersections of sensitive attributes
+* Entire pipeline for profiling model accuracy, stability, uncertainty, and fairness
+* Ability to analyze non-binary sensitive attributes and their intersections
+* Compatibility with [pre-, in-, and post-processors](https://aif360.readthedocs.io/en/latest/modules/algorithms.html#) for fairness enhancement from AIF360
 * Convenient metric computation interfaces: an interface for multiple models, an interface for multiple test sets, and an interface for saving results into a user-defined database
-* An `error_analysis` computation mode to analyze model stability and confidence for correct and incorrect prodictions splitted by groups
-* Data loaders with subsampling for fairness datasets
+* An `error_analysis` computation mode to analyze model stability and confidence for correct and incorrect prodictions broken down by groups
+* Metric static and interactive visualizations
+* Data loaders with subsampling for popular fair-ML benchmark datasets
 * User-friendly parameters input via config yaml files
 * Check out [our documentation](https://dataresponsibly.github.io/Virny/) for a comprehensive overview
 
 
-## 📖 Library Terminology
+## 📖 Library Overview
 
-This section briefly explains the main terminology used in our library.
+![Virny_Architecture](https://github.com/DataResponsibly/Virny/assets/42843889/91620e0f-11ff-4093-8fb6-c88c90bff711)
 
-* A **sensitive attribute** is an attribute that partitions the population into groups with unequal benefits received.
-* A **protected group** (or simply _group_) is created by partitioning the population by one or many sensitive attributes.
-* A **privileged value** of a sensitive attribute is a value that gives more benefit to a protected group, which includes it, than to protected groups, which do not include it.
-* A **subgroup** is created by splitting a protected group by privileges and disprivileged values.
-* A **group metric** is a metric that shows the relation between privileged and disprivileged subgroups created based on one or many sensitive attributes.
+The software framework decouples the process of model profiling into several stages, including **subgroup metric computation**,
+**disparity metric composition**, and **metric visualization**. This separation empowers data scientists with greater control and
+flexibility in employing the library, both during model development and for post-deployment monitoring. The above figure demonstrates
+how the library constructs a pipeline for model analysis. Inputs to a user interface are shown in green, pipeline stages are shown in blue,
+and the output of each stage is shown in purple.
 
 
 ## 🤗 Affiliations

diff --git a/docs/api/analyzers/AbstractOverallVarianceAnalyzer.md b/docs/api/analyzers/AbstractOverallVarianceAnalyzer.md
@@ -42,6 +42,14 @@ Abstract class for an analyzer that computes overall variance metrics for subgro
 
     Number of estimators in ensemble to measure base_model stability
 
+- **with_predict_proba** (*bool*) – defaults to `True`
+
+    [Optional] A flag if model can return probabilities for its predictions.  If no, only metrics based on labels (not labels and probabilities) will be computed.
+
+- **notebook_logs_stdout** (*bool*) – defaults to `False`
+
+    [Optional] True, if this interface was execute in a Jupyter notebook,  False, otherwise.
+
 - **verbose** (*int*) – defaults to `0`
 
     [Optional] Level of logs printing. The greater level provides more logs.  As for now, 0, 1, 2 levels are supported.
@@ -65,17 +73,12 @@ Abstract class for an analyzer that computes overall variance metrics for subgro
 
 ???- note "compute_metrics"
 
-    Measure metrics for the base model. Display plots for analysis if needed. Save results to a .pkl file
+    Measure metrics for the base model. Save results to a .csv file.
 
     **Parameters**
 
-    - **make_plots**     (*bool*)     – defaults to `False`    
     - **save_results**     (*bool*)     – defaults to `True`    
     - **with_fit**     (*bool*)     – defaults to `True`    
 
-???- note "get_metrics_dict"
-
-???- note "print_metrics"
-
 ???- note "save_metrics_to_file"
 
diff --git a/docs/api/analyzers/AbstractSubgroupAnalyzer.md b/docs/api/analyzers/AbstractSubgroupAnalyzer.md
@@ -40,6 +40,7 @@ Abstract class for a subgroup analyzer to compute metrics for subgroups.
     **Parameters**
 
     - **y_preds**    
+    - **models_predictions**     (*dict*)    
     - **save_results**     (*bool*)    
     - **result_filename**     (*str*)     – defaults to `None`    
     - **save_dir_path**     (*str*)     – defaults to `None`    

diff --git a/docs/api/analyzers/BatchOverallVarianceAnalyzer.md b/docs/api/analyzers/BatchOverallVarianceAnalyzer.md
@@ -12,7 +12,7 @@ Analyzer to compute subgroup variance metrics for batch learning models.
 
 - **base_model_name** (*str*)
 
-    Model name like 'HoeffdingTreeClassifier' or 'LogisticRegression'
+    Model name like 'DecisionTreeClassifier' or 'LogisticRegression'
 
 - **bootstrap_fraction** (*float*)
 
@@ -46,6 +46,14 @@ Analyzer to compute subgroup variance metrics for batch learning models.
 
     Number of estimators in ensemble to measure base_model stability
 
+- **with_predict_proba** (*bool*) – defaults to `True`
+
+    [Optional] A flag if model can return probabilities for its predictions.  If no, only metrics based on labels (not labels and probabilities) will be computed.
+
+- **notebook_logs_stdout** (*bool*) – defaults to `False`
+
+    [Optional] True, if this interface was execute in a Jupyter notebook,  False, otherwise.
+
 - **verbose** (*int*) – defaults to `0`
 
     [Optional] Level of logs printing. The greater level provides more logs.  As for now, 0, 1, 2 levels are supported.
@@ -69,17 +77,12 @@ Analyzer to compute subgroup variance metrics for batch learning models.
 
 ???- note "compute_metrics"
 
-    Measure metrics for the base model. Display plots for analysis if needed. Save results to a .pkl file
+    Measure metrics for the base model. Save results to a .csv file.
 
     **Parameters**
 
-    - **make_plots**     (*bool*)     – defaults to `False`    
     - **save_results**     (*bool*)     – defaults to `True`    
     - **with_fit**     (*bool*)     – defaults to `True`    
 
-???- note "get_metrics_dict"
-
-???- note "print_metrics"
-
 ???- note "save_metrics_to_file"
 
diff --git a/docs/api/analyzers/BatchOverallVarianceAnalyzerPostProcessing.md b/docs/api/analyzers/BatchOverallVarianceAnalyzerPostProcessing.md
@@ -0,0 +1,96 @@
+# BatchOverallVarianceAnalyzerPostProcessing
+
+Analyzer to compute subgroup variance metrics using the defined post-processor.
+
+
+
+## Parameters
+
+- **postprocessor**
+
+    One of postprocessors from aif360 (https://aif360.readthedocs.io/en/stable/modules/algorithms.html#module-aif360.algorithms.postprocessing)
+
+- **sensitive_attribute** (*str*)
+
+    A sensitive attribute to use for post-processing
+
+- **base_model**
+
+    Base model for stability measuring
+
+- **base_model_name** (*str*)
+
+    Model name like 'DecisionTreeClassifier' or 'LogisticRegression'
+
+- **bootstrap_fraction** (*float*)
+
+    [0-1], fraction from train_pd_dataset for fitting an ensemble of base models
+
+- **X_train** (*pandas.core.frame.DataFrame*)
+
+    Processed features train set
+
+- **y_train** (*pandas.core.frame.DataFrame*)
+
+    Targets train set
+
+- **X_test** (*pandas.core.frame.DataFrame*)
+
+    Processed features test set
+
+- **y_test** (*pandas.core.frame.DataFrame*)
+
+    Targets test set
+
+- **target_column** (*str*)
+
+    Name of the target column
+
+- **dataset_name** (*str*)
+
+    Name of dataset, used for correct results naming
+
+- **n_estimators** (*int*)
+
+    Number of estimators in ensemble to measure base_model stability
+
+- **with_predict_proba** (*bool*) – defaults to `True`
+
+    [Optional] A flag if model can return probabilities for its predictions.  If no, only metrics based on labels (not labels and probabilities) will be computed.
+
+- **notebook_logs_stdout** (*bool*) – defaults to `False`
+
+    [Optional] True, if this interface was execute in a Jupyter notebook,  False, otherwise.
+
+- **verbose** (*int*) – defaults to `0`
+
+    [Optional] Level of logs printing. The greater level provides more logs.  As for now, 0, 1, 2 levels are supported.
+
+
+
+
+## Methods
+
+???- note "UQ_by_boostrap"
+
+    Quantifying uncertainty of the base model by constructing an ensemble from bootstrapped samples and applying postprocessing intervention.
+
+    Return a dictionary where keys are models indexes, and values are lists of  correspondent model predictions for X_test set.
+
+    **Parameters**
+
+    - **boostrap_size**     (*int*)    
+    - **with_replacement**     (*bool*)    
+    - **with_fit**     (*bool*)     – defaults to `True`    
+
+???- note "compute_metrics"
+
+    Measure metrics for the base model. Save results to a .csv file.
+
+    **Parameters**
+
+    - **save_results**     (*bool*)     – defaults to `True`    
+    - **with_fit**     (*bool*)     – defaults to `True`    
+
+???- note "save_metrics_to_file"
+
diff --git a/docs/api/analyzers/SubgroupErrorAnalyzer.md b/docs/api/analyzers/SubgroupErrorAnalyzer.md
@@ -40,6 +40,7 @@ Analyzer to compute error metrics for subgroups.
     **Parameters**
 
     - **y_preds**    
+    - **models_predictions**     (*dict*)    
     - **save_results**     (*bool*)    
     - **result_filename**     (*str*)     – defaults to `None`    
     - **save_dir_path**     (*str*)     – defaults to `None`    

diff --git a/docs/api/analyzers/SubgroupVarianceAnalyzer.md b/docs/api/analyzers/SubgroupVarianceAnalyzer.md
@@ -6,7 +6,7 @@ Analyzer to compute variance metrics for subgroups.
 
 ## Parameters
 
-- **model_setting** (*virny.configs.constants.ModelSetting*)
+- **model_setting** (*[metrics.ModelSetting](../../metrics/ModelSetting)*)
 
     Model learning type; a constant from virny.configs.constants.ModelSetting
 
@@ -42,10 +42,22 @@ Analyzer to compute variance metrics for subgroups.
 
     A dictionary of protected groups where keys are subgroup names,  and values are X_test row indexes correspondent to this subgroup.
 
+- **postprocessor** – defaults to `None`
+
+    One of postprocessors from aif360 (https://aif360.readthedocs.io/en/stable/modules/algorithms.html#module-aif360.algorithms.postprocessing)
+
+- **postprocessing_sensitive_attribute** (*str*) – defaults to `None`
+
+    A sensitive attribute to use for post-processing
+
 - **computation_mode** (*str*) – defaults to `None`
 
     [Optional] A non-default mode for metrics computation. Should be included in the ComputationMode enum.
 
+- **notebook_logs_stdout** (*bool*) – defaults to `False`
+
+    [Optional] True, if this interface was execute in a Jupyter notebook,  False, otherwise.
+
 - **verbose** (*int*) – defaults to `0`
 
     [Optional] Level of logs printing. The greater level provides more logs.  As for now, 0, 1, 2 levels are supported.
@@ -66,7 +78,6 @@ Analyzer to compute variance metrics for subgroups.
     - **save_results**     (*bool*)    
     - **result_filename**     (*str*)     – defaults to `None`    
     - **save_dir_path**     (*str*)     – defaults to `None`    
-    - **make_plots**     (*bool*)     – defaults to `True`    
     - **with_fit**     (*bool*)     – defaults to `True`    
 
 ???- note "set_test_protected_groups"

diff --git a/docs/api/analyzers/SubgroupVarianceCalculator.md b/docs/api/analyzers/SubgroupVarianceCalculator.md
@@ -26,6 +26,10 @@ Calculator that calculates variance metrics for subgroups.
 
     [Optional] A non-default mode for metrics computation. Should be included in the ComputationMode enum.
 
+- **with_predict_proba** (*bool*) – defaults to `True`
+
+    [Optional] A flag if model can return probabilities for its predictions.  If no, only metrics based on labels (not labels and probabilities) will be computed.
+
 
 
 
@@ -39,6 +43,7 @@ Calculator that calculates variance metrics for subgroups.
 
     **Parameters**
 
+    - **y_preds**    
     - **models_predictions**     (*dict*)    
     - **save_results**     (*bool*)    
     - **result_filename**     (*str*)     – defaults to `None`    

diff --git a/docs/api/custom-classes/MetricsComposer.md b/docs/api/custom-classes/MetricsComposer.md
@@ -1,8 +1,8 @@
 # MetricsComposer
 
-Composer class that combines different subgroup metrics to create group metrics  such as 'Disparate_Impact' or 'Accuracy_Parity'
-
+Metric Composer class that combines different subgroup metrics to create disparity metrics  such as 'Disparate_Impact' or 'Accuracy_Difference'.
 
+Definitions of the disparity metrics could be observed in the __init__ method of the Metric Composer:  https://github.com/DataResponsibly/Virny/blob/main/virny/custom_classes/metrics_composer.py
 
 ## Parameters
 

diff --git a/docs/api/custom-classes/MetricsInteractiveVisualizer.md b/docs/api/custom-classes/MetricsInteractiveVisualizer.md
@@ -0,0 +1,37 @@
+# MetricsInteractiveVisualizer
+
+Class to create an interactive web app based on models metrics.
+
+
+
+## Parameters
+
+- **X_data** (*pandas.core.frame.DataFrame*)
+
+    An original features dataframe
+
+- **y_data** (*pandas.core.frame.DataFrame*)
+
+    An original target column pandas series
+
+- **model_metrics**
+
+    A dictionary or a dataframe where keys are model names and values are dataframes of subgroup metrics for each model
+
+- **sensitive_attributes_dct** (*dict*)
+
+    A dictionary where keys are sensitive attributes names (including attributes intersections),  and values are privilege values for these attributes
+
+
+
+
+## Methods
+
+???- note "create_web_app"
+
+    Build an interactive web application.
+
+    **Parameters**
+
+    - **start_app**     – defaults to `True`    
+