A Streamlit web application for generating synthetic data using the YData Synthetic library. This tool allows users to create synthetic datasets that mimic their real-world data while preserving privacy and statistical properties.
- Upload and process tabular classification datasets
- Configure GAN model parameters through an intuitive UI
- Generate synthetic data using CGAN or WGAN-GP models
- Visual comparison between real and synthetic data distributions
- Download synthetic datasets in CSV format
- Interactive parameter selection with real-time updates
pandas
matplotlib
numpy
seaborn
streamlit
ydata-synthetic==1.4.0
- Clone this repository:
git clone https://github.com/rajeshai/ydata-synthetic-streamlit.git
cd ydata-synthetic-streamlit
- Install the required packages:
pip install -r requirements.txt
- Run the Streamlit app:
streamlit run app.py
-
Open your web browser and navigate to the provided local URL (typically
http://localhost:8501
) -
Follow these steps in the application:
- Upload your preprocessed tabular classification dataset (CSV format)
- Select the GAN model (CGAN or WGAN-GP)
- Choose numerical and categorical columns
- Configure model parameters:
- Noise dimension
- Layer dimension
- Batch size
- Sample interval
- Number of epochs
- Learning rate
- Beta coefficients
- Specify the number of synthetic samples to generate
- Click the training button to start the process
- Download the generated synthetic dataset
Parameter | Description | Default |
---|---|---|
Model | Choose between CGAN or WGAN-GP | CGAN |
Noise Dimension | Input noise dimension for GAN | 128 |
Layer Dimension | Dimension of network layers | 128 |
Batch Size | Number of samples per training batch | 500 |
Sample Interval | Interval for sampling during training | 100 |
Epochs | Number of training epochs | 2 |
Learning Rate | Model learning rate (x1e-3) | 0.05 |
Beta Coefficients | Adam optimizer beta parameters | β1=0.5, β2=0.9 |
The application provides:
- Visual comparisons between real and synthetic data distributions
- Downloadable synthetic dataset in CSV format
- Training progress indicators
- Success notifications with celebratory balloons! 🎈
- For demonstration purposes, it's recommended to start with a small number of epochs
- The application requires preprocessed data with no missing values
- Training time depends on dataset size and selected parameters
- The app uses caching to optimize performance
- YData Synthetic Library for the synthetic data generation capabilities
- Streamlit for the wonderful web application framework