A Streamlit-based web application for analyzing categorical data, detecting bias, performing independence testing and augmenting data.
- Dataset Upload & Analysis: Support for CSV file uploads with automatic categorical column detection
- Bias Detection: Perform Goodness-of-Fit tests to identify bias in categorical data
- Independence Testing: Analyze relationships between categorical variables
- AI-powered Data Augmentation: Enhance your dataset with intelligent data augmentation
- Interactive Visualization: Visual representation of test results and data distributions
- Export Functionality: Download augmented datasets in CSV format
- Python 3.11 or higher
- pip (Python package installer)
- Docker (optional)
-
Clone the repository
-
Create and activate a virtual environment
# On Windows
python -m venv venv
.\venv\Scripts\activate
# On macOS/Linux
python3 -m venv venv
source venv/bin/activate
- Install dependencies
pip install -r requirements.txt
- Run the application
streamlit run src/Home.py
The application will be available at http://localhost:8501
- Pull the Docker image
docker pull adithyn/biasbalance
- Run the container
docker run --name biasbalance-app -p 8501:8501 adithyn/biasbalance
The application will be available at http://localhost:8501
Create a .env
file in the root directory with the following variables (if needed):
OPENAI_API_KEY=your_api_key_here
Also you can add the API KEY when prompted in the application. It will be stored in the session state and used for data augmentation.
Note that the OpenAI API key is optional and only required for the Data Augmentation feature.
- Launch the application
- Upload your CSV dataset using the file uploader
- Categorical Columns will be already selected for you. (make changes if needed)
- Use the sidebar to navigate between different testing options
- Perform Goodness-of-Fit tests to detect bias and add for augmentation if bias is detected
- Analyze relationships between categorical variables using the Independence Testing feature
- Use the Data Augmentation feature to enhance your dataset
- View results and download augmented datasets if needed