Merge pull request #124 from DataResponsibly/development

Release 0.5.0
DataResponsibly · Jun 2, 2024 · 76e4bb3 · 76e4bb3
2 parents 14c275a + e7a304a
commit 76e4bb3
Show file tree

Hide file tree

Showing 158 changed files with 115,694 additions and 182,002 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -15,7 +15,7 @@ jobs:
       fail-fast: false
       matrix:
         python: [3.8, 3.9]
-        os: [ubuntu-latest, macos-latest]
+        os: [ubuntu-latest, macos-13]
 
     uses: ./.github/workflows/build-virny.yml
     with:
@@ -28,7 +28,7 @@ jobs:
       fail-fast: false
       matrix:
         python: [3.8, 3.9]
-        os: [ubuntu-latest, macos-latest]
+        os: [ubuntu-latest, macos-13]
 
     uses: ./.github/workflows/unit-tests.yml
     with:

diff --git a/.gitignore b/.gitignore
@@ -5,6 +5,7 @@ notebooks
 .DS_Store
 .ipynb_checkpoints
 docs/examples/test.py
+tests/results
 
 # Remove big files from GitHub repo
 virny/datasets/2018

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,3 +1,3 @@
-include virny/datasets/*.csv
-include virny/datasets/*.gz
-include virny/datasets/*.zip
+include virny/datasets/data/*.csv
+include virny/datasets/data/*.gz
+include virny/datasets/data/*.zip
diff --git a/README.md b/README.md
@@ -32,7 +32,8 @@
 
 **Virny** is a Python library for in-depth profiling of model performance across overall and disparity dimensions. 
 In addition to its metric computation capabilities, the library provides an interactive tool called _VirnyView_ 
-to streamline responsible model selection and generate nutritional labels for ML models. 
+to streamline responsible model selection and generate nutritional labels for ML models.
+
 The Virny library was developed based on three fundamental principles: 
 
 1) easy extensibility of model analysis capabilities;
@@ -65,33 +66,52 @@ pip install virny
 * [Interactive Demo](https://huggingface.co/spaces/denys-herasymuk/virny-demo)
 
 
-## 💡 Features
+## 😎 Why Virny
+
+In contrast to existing fairness software libraries and model card generating frameworks, our system stands out in four key aspects:
+
+1. Virny facilitates the measurement of **all normatively important performance dimensions** (including _fairness_, _stability_, and _uncertainty_) for a set of initialized models, both overall and broken down by user-defined subgroups of interest.
+
+2. Virny enables data scientists to analyze performance using **multiple sensitive attributes** (including _non-binary_) and their _intersections_.
+
+3. Virny offers **diverse APIs for metric computation**, designed to analyze multiple models in a single execution, assessing stability and uncertainty on correct and incorrect predictions broken down by protected groups, and testing models on multiple test sets, including in-domain and out-of-domain.
+
+4. Virny implements streamlined flow design tailored for **responsible model selection**, reducing the complexity associated with numerous model types, performance dimensions, and data-centric and model-centric interventions.
+
+
+## 💡 List of Features
 
-* Entire pipeline for profiling model accuracy, stability, uncertainty, and fairness
+* Profiling of all normatively important performance dimensions: accuracy, stability, uncertainty, and fairness
 * Ability to analyze non-binary sensitive attributes and their intersections
-* Compatibility with [pre-, in-, and post-processors](https://aif360.readthedocs.io/en/latest/modules/algorithms.html#) for fairness enhancement from AIF360
 * Convenient metric computation interfaces: an interface for multiple models, an interface for multiple test sets, and an interface for saving results into a user-defined database
+* Interactive _VirnyView_ visualizer that profiles dataset properties related to protected groups, computes comprehensive [nutritional labels](http://sites.computer.org/debull/A19sept/p13.pdf) for individual models, compares multiple models according to multiple metrics, and guides users through model selection
+* Compatibility with [pre-, in-, and post-processors](https://aif360.readthedocs.io/en/latest/modules/algorithms.html#) for fairness enhancement from AIF360
 * An `error_analysis` computation mode to analyze model stability and confidence for correct and incorrect prodictions broken down by groups
 * Metric static and interactive visualizations
 * Data loaders with subsampling for popular fair-ML benchmark datasets
-* User-friendly parameters input via config yaml files
-* Check out [our documentation](https://dataresponsibly.github.io/Virny/) for a comprehensive overview
+* User-friendly parameters input via config yaml files 
 
+Check out [our documentation](https://dataresponsibly.github.io/Virny/) for a comprehensive overview.
 
-## 📖 Library Overview
 
-![Virny_Architecture](https://github.com/DataResponsibly/Virny/assets/42843889/91620e0f-11ff-4093-8fb6-c88c90bff711)
+## 🤗 Affiliations
+
+![NYU-UCU-Logos](https://user-images.githubusercontent.com/42843889/216840888-071bf184-f0e3-4a3e-94dc-c0d1c7784143.png)
 
-The software framework decouples the process of model profiling into several stages, including **subgroup metric computation**,
-**disparity metric composition**, and **metric visualization**. This separation empowers data scientists with greater control and
-flexibility in employing the library, both during model development and for post-deployment monitoring. The above figure demonstrates
-how the library constructs a pipeline for model analysis. Inputs to a user interface are shown in green, pipeline stages are shown in blue,
-and the output of each stage is shown in purple.
 
+## 💬 Citation
 
-## 🤗 Affiliations
+If Virny has been useful to you, and you would like to cite it in a scientific publication, please refer to the [paper](https://dl.acm.org/doi/abs/10.1145/3626246.3654738) published at SIGMOD:
 
-![NYU-UCU-Logos](https://user-images.githubusercontent.com/42843889/216840888-071bf184-f0e3-4a3e-94dc-c0d1c7784143.png)
+```bibtex
+@inproceedings{herasymuk2024responsible,
+  title={Responsible Model Selection with Virny and VirnyView},
+  author={Herasymuk, Denys and Arif Khan, Falaah and Stoyanovich, Julia},
+  booktitle={Companion of the 2024 International Conference on Management of Data},
+  pages={488--491},
+  year={2024}
+}
+```
 
 
 ## 📝 License

diff --git a/docs/.pages b/docs/.pages
@@ -1,5 +1,6 @@
 nav:
   - introduction
-  - api
   - examples
+  - glossary
+  - api
   - release_notes
diff --git a/docs/api/analyzers/AbstractOverallVarianceAnalyzer.md b/docs/api/analyzers/AbstractOverallVarianceAnalyzer.md
@@ -42,6 +42,10 @@ Abstract class for an analyzer that computes overall variance metrics for subgro
 
     Number of estimators in ensemble to measure base_model stability
 
+- **random_state** (*int*) – defaults to `None`
+
+    [Optional] Controls the randomness of the bootstrap approach for model arbitrariness evaluation
+
 - **with_predict_proba** (*bool*) – defaults to `True`
 
     [Optional] A flag if model can return probabilities for its predictions.  If no, only metrics based on labels (not labels and probabilities) will be computed.

diff --git a/docs/api/analyzers/BatchOverallVarianceAnalyzer.md b/docs/api/analyzers/BatchOverallVarianceAnalyzer.md
@@ -46,6 +46,10 @@ Analyzer to compute subgroup variance metrics for batch learning models.
 
     Number of estimators in ensemble to measure base_model stability
 
+- **random_state** (*int*) – defaults to `None`
+
+    [Optional] Controls the randomness of the bootstrap approach for model arbitrariness evaluation
+
 - **with_predict_proba** (*bool*) – defaults to `True`
 
     [Optional] A flag if model can return probabilities for its predictions.  If no, only metrics based on labels (not labels and probabilities) will be computed.

diff --git a/docs/api/analyzers/BatchOverallVarianceAnalyzerPostProcessing.md b/docs/api/analyzers/BatchOverallVarianceAnalyzerPostProcessing.md
@@ -54,6 +54,10 @@ Analyzer to compute subgroup variance metrics using the defined post-processor.
 
     Number of estimators in ensemble to measure base_model stability
 
+- **random_state** (*int*) – defaults to `None`
+
+    [Optional] Controls the randomness of the bootstrap approach for model arbitrariness evaluation
+
 - **with_predict_proba** (*bool*) – defaults to `True`
 
     [Optional] A flag if model can return probabilities for its predictions.  If no, only metrics based on labels (not labels and probabilities) will be computed.

diff --git a/docs/api/analyzers/SubgroupVarianceAnalyzer.md b/docs/api/analyzers/SubgroupVarianceAnalyzer.md
@@ -50,10 +50,18 @@ Analyzer to compute variance metrics for subgroups.
 
     A sensitive attribute to use for post-processing
 
+- **random_state** (*int*) – defaults to `None`
+
+    [Optional] Controls the randomness of the bootstrap approach for model arbitrariness evaluation
+
 - **computation_mode** (*str*) – defaults to `None`
 
     [Optional] A non-default mode for metrics computation. Should be included in the ComputationMode enum.
 
+- **with_predict_proba** (*bool*) – defaults to `True`
+
+    [Optional] True, if models in models_config have a predict_proba method and can return probabilities for predictions,  False, otherwise. Note that if it is set to False, only metrics based on labels (not labels and probabilities) will be computed.  Ignored when a postprocessor is not None, and set to False in this case.
+
 - **notebook_logs_stdout** (*bool*) – defaults to `False`
 
     [Optional] True, if this interface was execute in a Jupyter notebook,  False, otherwise.

diff --git a/docs/api/custom-classes/BaseFlowDataset.md b/docs/api/custom-classes/BaseFlowDataset.md
@@ -6,9 +6,9 @@ Dataset class with custom train and test splits that is used as input for metric
 
 ## Parameters
 
-- **init_features_df** (*pandas.core.frame.DataFrame*)
+- **init_sensitive_attrs_df** (*pandas.core.frame.DataFrame*)
 
-    Full train + test non-preprocessed dataset of features without the target column.  It is used for creating test groups.
+    Full train + test non-preprocessed dataset of sensitive attributes with initial indexes.  It is used for creating test groups.
 
 - **X_train_val** (*pandas.core.frame.DataFrame*)
 

diff --git a/docs/api/datasets/BankMarketingDataset.md b/docs/api/datasets/BankMarketingDataset.md
@@ -0,0 +1,19 @@
+# BankMarketingDataset
+
+Dataset class for the Bank Marketing dataset that contains sensitive attributes among feature columns. Source: https://github.com/tailequy/fairness_dataset/blob/main/experiments/data/bank-full.csv General description and analysis: https://arxiv.org/pdf/2110.00530.pdf (Section 3.1.5) Broad description: https://archive.ics.uci.edu/dataset/222/bank+marketing
+
+
+
+## Parameters
+
+- **subsample_size** (*int*) – defaults to `None`
+
+    Subsample size to create based on the input dataset
+
+- **subsample_seed** (*int*) – defaults to `None`
+
+    Seed for sampling using the sample() method from pandas
+
+
+
+
diff --git a/docs/api/datasets/CardiovascularDiseaseDataset.md b/docs/api/datasets/CardiovascularDiseaseDataset.md
@@ -0,0 +1,19 @@
+# CardiovascularDiseaseDataset
+
+Dataset class for the Cardiovascular Disease dataset that contains sensitive attributes among feature columns. Source and broad description: https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset
+
+
+
+## Parameters
+
+- **subsample_size** (*int*) – defaults to `None`
+
+    Subsample size to create based on the input dataset
+
+- **subsample_seed** (*int*) – defaults to `None`
+
+    Seed for sampling using the sample() method from pandas
+
+
+
+
diff --git a/docs/api/datasets/CreditCardDefaultDataset.md b/docs/api/datasets/CreditCardDefaultDataset.md
diff --git a/docs/api/datasets/DiabetesDataset.md b/docs/api/datasets/DiabetesDataset.md
diff --git a/docs/api/datasets/DiabetesDataset2019.md b/docs/api/datasets/DiabetesDataset2019.md
@@ -0,0 +1,23 @@
+# DiabetesDataset2019
+
+Dataset class for the Diabetes 2019 dataset that contains sensitive attributes among feature columns. Source and broad description: https://www.kaggle.com/datasets/tigganeha4/diabetes-dataset-2019/data
+
+
+
+## Parameters
+
+- **subsample_size** (*int*) – defaults to `None`
+
+    Subsample size to create based on the input dataset
+
+- **subsample_seed** (*int*) – defaults to `None`
+
+    Seed for sampling using the sample() method from pandas
+
+- **with_nulls** (*bool*) – defaults to `True`
+
+    Whether to keep nulls in the dataset or drop rows with any nulls. Default: True.
+
+
+
+
diff --git a/docs/api/datasets/GermanCreditDataset.md b/docs/api/datasets/GermanCreditDataset.md
@@ -0,0 +1,19 @@
+# GermanCreditDataset
+
+Dataset class for the German Credit dataset that contains sensitive attributes among feature columns. Source: https://github.com/tailequy/fairness_dataset/blob/main/experiments/data/german_data_credit.csv General description and analysis: https://arxiv.org/pdf/2110.00530.pdf (Section 3.1.3) Broad description: https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data
+
+
+
+## Parameters
+
+- **subsample_size** (*int*) – defaults to `None`
+
+    Subsample size to create based on the input dataset
+
+- **subsample_seed** (*int*) – defaults to `None`
+
+    Seed for sampling using the sample() method from pandas
+
+
+
+
diff --git a/docs/api/overview.md b/docs/api/overview.md
@@ -49,10 +49,12 @@ The purpose is to provide sample datasets for functionality testing and show exa
 - [ACSMobilityDataset](../datasets/ACSMobilityDataset)
 - [ACSPublicCoverageDataset](../datasets/ACSPublicCoverageDataset)
 - [ACSTravelTimeDataset](../datasets/ACSTravelTimeDataset)
+- [BankMarketingDataset](../datasets/BankMarketingDataset)
+- [CardiovascularDiseaseDataset](../datasets/CardiovascularDiseaseDataset)
 - [CompasDataset](../datasets/CompasDataset)
 - [CompasWithoutSensitiveAttrsDataset](../datasets/CompasWithoutSensitiveAttrsDataset)
-- [CreditCardDefaultDataset](../datasets/CreditCardDefaultDataset)
-- [DiabetesDataset](../datasets/DiabetesDataset)
+- [DiabetesDataset2019](../datasets/DiabetesDataset2019)
+- [GermanCreditDataset](../datasets/GermanCreditDataset)
 - [LawSchoolDataset](../datasets/LawSchoolDataset)
 - [RicciDataset](../datasets/RicciDataset)
 - [StudentPerformancePortugueseDataset](../datasets/StudentPerformancePortugueseDataset)

diff --git a/docs/api/preprocessing/preprocess-dataset.md b/docs/api/preprocessing/preprocess-dataset.md
@@ -14,6 +14,10 @@ Preprocess an input dataset using sklearn ColumnTransformer. Split the dataset o
 
     Instance of sklearn ColumnTransformer to preprocess categorical and numerical columns.
 
+- **sensitive_attributes_dct** (*dict*)
+
+    Dictionary of sensitive attribute names and their disadvantaged values.
+
 - **test_set_fraction** (*float*)
 
     Fraction from 0 to 1. Used to split the input dataset on the train and test sets.

diff --git a/docs/api/user-interfaces/compute-metrics-with-config.md b/docs/api/user-interfaces/compute-metrics-with-config.md
@@ -26,6 +26,10 @@ Return a dictionary where keys are model names, and values are metrics for sensi
 
     [Optional] Postprocessor object to apply to model predictions before metrics computation
 
+- **with_predict_proba** (*bool*) – defaults to `True`
+
+    [Optional] True, if models in models_config have a predict_proba method and can return probabilities for predictions,  False, otherwise. Note that if it is set to False, only metrics based on labels (not labels and probabilities) will be computed.  Ignored when a postprocessor is not None, and set to False in this case.
+
 - **notebook_logs_stdout** (*bool*) – defaults to `False`
 
     [Optional] True, if this interface was execute in a Jupyter notebook,  False, otherwise.

diff --git a/docs/api/user-interfaces/compute-metrics-with-db-writer.md b/docs/api/user-interfaces/compute-metrics-with-db-writer.md
@@ -30,6 +30,10 @@ Return a dictionary where keys are model names, and values are metrics for sensi
 
     [Optional] Postprocessor object to apply to model predictions before metrics computation
 
+- **with_predict_proba** (*bool*) – defaults to `True`
+
+    [Optional] True, if models in models_config have a predict_proba method and can return probabilities for predictions,  False, otherwise. Note that if it is set to False, only metrics based on labels (not labels and probabilities) will be computed.  Ignored when a postprocessor is not None, and set to False in this case.
+
 - **notebook_logs_stdout** (*bool*) – defaults to `False`
 
     [Optional] True, if this interface was execute in a Jupyter notebook,  False, otherwise.

diff --git a/docs/api/user-interfaces/compute-metrics-with-multiple-test-sets.md b/docs/api/user-interfaces/compute-metrics-with-multiple-test-sets.md
@@ -30,6 +30,10 @@ Compute stability and accuracy metrics for each model in models_config based on
 
     Python function object has one argument (run_models_metrics_df) and save this metrics df to a target database
 
+- **with_predict_proba** (*bool*) – defaults to `True`
+
+    [Optional] True, if models in models_config have a predict_proba method and can return probabilities for predictions,  False, otherwise. Note that if it is set to False, only metrics based on labels (not labels and probabilities) will be computed.  Ignored when a postprocessor is not None, and set to False in this case.
+
 - **notebook_logs_stdout** (*bool*) – defaults to `False`
 
     [Optional] True, if this interface was execute in a Jupyter notebook,  False, otherwise.

diff --git a/docs/api/utils/create-test-protected-groups.md b/docs/api/utils/create-test-protected-groups.md
@@ -10,9 +10,9 @@ Return a dictionary where keys are subgroup names, and values are X_test row ind
 
     Test feature set
 
-- **init_features_df** (*pandas.core.frame.DataFrame*)
+- **init_sensitive_attrs_df** (*pandas.core.frame.DataFrame*)
 
-    Initial full dataset without preprocessing
+    Initial full dataset of sensitive attributes without preprocessing
 
 - **sensitive_attributes_dct** (*dict*)
 

diff --git a/docs/examples/.pages b/docs/examples/.pages
@@ -1,7 +1,6 @@
 title: Examples 🍱
 nav:
     - Multiple_Models_Interface_Use_Case.md
-    - Interactive_Web_App_Demo.md
     - Multiple_Models_Interface_With_DB_Writer.md
     - Multiple_Models_Interface_With_Error_Analysis.md
     - Multiple_Models_Interface_With_Multiple_Test_Sets.md