Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add visualization component #72

Merged
merged 37 commits into from
Dec 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
e65dcc1
Added plot 1 to a gradio app
Sep 30, 2023
c84456e
Created subgroup and group heatmaps
Oct 1, 2023
21f783f
Added bar charts to a web app
Oct 1, 2023
e0113e1
Added bar charts to a web app
denysgerasymuk799 Oct 1, 2023
8580bbc
Added init for view 1
denysgerasymuk799 Oct 1, 2023
de6b7d7
Improved a metrics bar chart
denysgerasymuk799 Oct 2, 2023
3afea56
Tested a gradio app on a big metrics df
denysgerasymuk799 Oct 2, 2023
6da26a2
Added a gradio app for Law_School
denysgerasymuk799 Oct 4, 2023
cb207bb
Added an overall subgrop to heatmaps
denysgerasymuk799 Oct 4, 2023
498b0ef
Added a gradio app for Ricci
denysgerasymuk799 Oct 5, 2023
e49c7c8
Added minor fixes to a model selection ap
denysgerasymuk799 Oct 6, 2023
69306c5
Reveresed a color bar for heatmaps
denysgerasymuk799 Oct 7, 2023
6262752
Added a table with model names that satisfy all 4 constraints
denysgerasymuk799 Oct 7, 2023
6146aae
Added tolerance to heatmaps
denysgerasymuk799 Oct 8, 2023
b8ea341
Added tolerance to heatmaps
denysgerasymuk799 Oct 8, 2023
b981218
Added a test sample for data stats panel
denysgerasymuk799 Oct 9, 2023
97ccb6f
Added subgroup proportions and base rates
denysgerasymuk799 Oct 9, 2023
ab34104
Changed a default range for Label_Stability_Ratio
denysgerasymuk799 Oct 10, 2023
84cf426
Restructured section 3 in the gradio app
denysgerasymuk799 Oct 11, 2023
df44d29
Added dynamic variables for the stats bar chart
denysgerasymuk799 Oct 12, 2023
866f30f
Added minor fixes to a visualization component
denysgerasymuk799 Oct 13, 2023
7db1b9b
Resolved merge conflict
denysgerasymuk799 Nov 27, 2023
620f36f
Added model performance summary
denysgerasymuk799 Nov 28, 2023
7d2e557
Added Positive-Rate to a model performance summary plot
denysgerasymuk799 Nov 29, 2023
a773815
Improved dataset stats plot
denysgerasymuk799 Nov 29, 2023
1d5ad3a
Added overall and disparity constraints to a model selection bar chart
denysgerasymuk799 Nov 29, 2023
2a6e71e
Added uncertainty disparity bar charts
denysgerasymuk799 Nov 29, 2023
ae533db
Set red-green color palette
denysgerasymuk799 Nov 30, 2023
c1feaf9
Added test metrics for ACS Public Coverage
denysgerasymuk799 Dec 7, 2023
612b025
Save current version of tolerance
denysgerasymuk799 Dec 10, 2023
8259f79
Added dynamic tolerance
denysgerasymuk799 Dec 10, 2023
87318fd
Added tests for tolerance
denysgerasymuk799 Dec 10, 2023
52ea843
Added tests for tolerance
denysgerasymuk799 Dec 10, 2023
1d2cc3c
wip
denysgerasymuk799 Dec 17, 2023
06c60fc
Added error handling for a dataset stats screen
denysgerasymuk799 Dec 17, 2023
f06cc9f
Added all error handling
denysgerasymuk799 Dec 18, 2023
b3c88f2
wip
denysgerasymuk799 Dec 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@
</p>



## 📜 Description

**Virny** is a Python library for auditing model stability and fairness. The Virny library was
Expand Down
324 changes: 324 additions & 0 deletions docs/examples/Multiple_Models_Interface_Vis.ipynb

Large diffs are not rendered by default.

299 changes: 299 additions & 0 deletions docs/examples/Multiple_Models_Interface_Vis_Income.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,299 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "248cbed8",
"metadata": {
"ExecuteTime": {
"end_time": "2023-12-10T22:37:44.370856Z",
"start_time": "2023-12-10T22:37:43.972175Z"
}
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7ec6cd08",
"metadata": {
"ExecuteTime": {
"end_time": "2023-12-10T22:37:44.380242Z",
"start_time": "2023-12-10T22:37:44.371542Z"
}
},
"outputs": [],
"source": [
"import os\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"os.environ[\"PYTHONWARNINGS\"] = \"ignore\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b8cb69f2",
"metadata": {
"ExecuteTime": {
"end_time": "2023-12-10T22:37:44.391659Z",
"start_time": "2023-12-10T22:37:44.380644Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Current location: /Users/denys_herasymuk/UCU/4course_2term/Bachelor_Thesis/Code/Virny\n"
]
}
],
"source": [
"cur_folder_name = os.getcwd().split('/')[-1]\n",
"if cur_folder_name != \"Virny\":\n",
" os.chdir(\"../..\")\n",
"\n",
"print('Current location: ', os.getcwd())"
]
},
{
"cell_type": "markdown",
"id": "a578f2ab",
"metadata": {},
"source": [
"# Multiple Models Interface Usage"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7a9241de",
"metadata": {
"ExecuteTime": {
"end_time": "2023-12-10T22:37:45.918385Z",
"start_time": "2023-12-10T22:37:44.390547Z"
}
},
"outputs": [],
"source": [
"import os\n",
"import pandas as pd\n",
"\n",
"from virny.datasets import ACSIncomeDataset\n",
"from virny.custom_classes.metrics_composer import MetricsComposer\n",
"from virny.custom_classes.metrics_interactive_visualizer import MetricsInteractiveVisualizer"
]
},
{
"cell_type": "code",
"execution_count": 5,
"outputs": [],
"source": [
"data_loader = ACSIncomeDataset(state=['GA'], year=2018, with_nulls=False, subsample_size=15_000, subsample_seed=42)\n",
"sensitive_attributes_dct = {'SEX': '2', 'RAC1P': ['2', '3', '4', '5', '6', '7', '8', '9'], 'SEX&RAC1P': None}"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-12-10T22:37:47.214487Z",
"start_time": "2023-12-10T22:37:45.921391Z"
}
},
"id": "d3c53c7b72ecbcd0"
},
{
"cell_type": "code",
"execution_count": 6,
"outputs": [],
"source": [
"ROOT_DIR = os.path.join('docs', 'examples')\n",
"subgroup_metrics_df = pd.read_csv(os.path.join(ROOT_DIR, 'income_subgroup_metrics.csv'), header=0)\n",
"subgroup_metrics_df['Model_Name'] = (subgroup_metrics_df['Model_Name'] + '__alpha=' +\n",
" subgroup_metrics_df['Intervention_Param'].astype(str))"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-12-10T22:37:47.242581Z",
"start_time": "2023-12-10T22:37:47.214727Z"
}
},
"id": "2aab7c79ecdee914"
},
{
"cell_type": "code",
"execution_count": 7,
"outputs": [
{
"data": {
"text/plain": " Metric SEX RAC1P SEX&RAC1P \\\n0 Accuracy_Parity 0.047756 0.074977 0.065217 \n1 Aleatoric_Uncertainty_Parity -0.039005 -0.011947 -0.009222 \n2 Aleatoric_Uncertainty_Ratio 0.935159 0.979638 0.984220 \n3 Equalized_Odds_FNR 0.030793 -0.110745 -0.052498 \n4 Equalized_Odds_FPR -0.021317 0.000952 -0.007008 \n\n Model_Name \n0 LGBMClassifier__alpha=0.7 \n1 LGBMClassifier__alpha=0.7 \n2 LGBMClassifier__alpha=0.7 \n3 LGBMClassifier__alpha=0.7 \n4 LGBMClassifier__alpha=0.7 ",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Metric</th>\n <th>SEX</th>\n <th>RAC1P</th>\n <th>SEX&amp;RAC1P</th>\n <th>Model_Name</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Accuracy_Parity</td>\n <td>0.047756</td>\n <td>0.074977</td>\n <td>0.065217</td>\n <td>LGBMClassifier__alpha=0.7</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Aleatoric_Uncertainty_Parity</td>\n <td>-0.039005</td>\n <td>-0.011947</td>\n <td>-0.009222</td>\n <td>LGBMClassifier__alpha=0.7</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Aleatoric_Uncertainty_Ratio</td>\n <td>0.935159</td>\n <td>0.979638</td>\n <td>0.984220</td>\n <td>LGBMClassifier__alpha=0.7</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Equalized_Odds_FNR</td>\n <td>0.030793</td>\n <td>-0.110745</td>\n <td>-0.052498</td>\n <td>LGBMClassifier__alpha=0.7</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Equalized_Odds_FPR</td>\n <td>-0.021317</td>\n <td>0.000952</td>\n <td>-0.007008</td>\n <td>LGBMClassifier__alpha=0.7</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_names = subgroup_metrics_df['Model_Name'].unique()\n",
"models_metrics_dct = dict()\n",
"for model_name in model_names:\n",
" models_metrics_dct[model_name] = subgroup_metrics_df[subgroup_metrics_df['Model_Name'] == model_name]\n",
"\n",
"metrics_composer = MetricsComposer(models_metrics_dct, sensitive_attributes_dct)\n",
"models_composed_metrics_df = metrics_composer.compose_metrics()\n",
"models_composed_metrics_df.head()"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-12-10T22:37:47.297089Z",
"start_time": "2023-12-10T22:37:47.240439Z"
}
},
"id": "44ee5eff6054ce04"
},
{
"cell_type": "code",
"execution_count": 8,
"outputs": [
{
"data": {
"text/plain": "dict_keys(['LGBMClassifier__alpha=0.7', 'LGBMClassifier__alpha=0.0', 'LGBMClassifier__alpha=0.4', 'LogisticRegression__alpha=0.0', 'LogisticRegression__alpha=0.7', 'LogisticRegression__alpha=0.4', 'MLPClassifier__alpha=0.0', 'MLPClassifier__alpha=0.7', 'MLPClassifier__alpha=0.4', 'RandomForestClassifier__alpha=0.4', 'RandomForestClassifier__alpha=0.7', 'RandomForestClassifier__alpha=0.0'])"
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"models_metrics_dct.keys()"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-12-10T22:37:47.328697Z",
"start_time": "2023-12-10T22:37:47.295950Z"
}
},
"id": "15ed7d1ba1f22317"
},
{
"cell_type": "markdown",
"id": "deb45226",
"metadata": {},
"source": [
"## Metrics Visualization and Reporting"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "435b9d98",
"metadata": {
"ExecuteTime": {
"end_time": "2023-12-10T22:37:47.374721Z",
"start_time": "2023-12-10T22:37:47.317882Z"
}
},
"outputs": [],
"source": [
"visualizer = MetricsInteractiveVisualizer(data_loader.X_data, data_loader.y_data,\n",
" models_metrics_dct, models_composed_metrics_df,\n",
" sensitive_attributes_dct=sensitive_attributes_dct)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running on local URL: http://127.0.0.1:7860\n",
"\n",
"To create a public link, set `share=True` in `launch()`.\n",
"Keyboard interruption in main thread... closing server.\n"
]
}
],
"source": [
"visualizer.start_web_app()"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-12-11T00:26:17.429094Z",
"start_time": "2023-12-10T22:37:47.343749Z"
}
},
"id": "678a9dc8d51243f4"
},
{
"cell_type": "code",
"execution_count": 11,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Closing server running on port: 7860\n"
]
}
],
"source": [
"visualizer.stop_web_app()"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-12-11T00:26:17.482944Z",
"start_time": "2023-12-11T00:26:17.438287Z"
}
},
"id": "277b6d1de837dab7"
},
{
"cell_type": "code",
"execution_count": 11,
"outputs": [],
"source": [],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-12-11T00:26:17.483195Z",
"start_time": "2023-12-11T00:26:17.479725Z"
}
},
"id": "21c0ad91536f0af5"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading