Synthetic Data Generator App

A Streamlit web application for generating synthetic data using the YData Synthetic library. This tool allows users to create synthetic datasets that mimic their real-world data while preserving privacy and statistical properties.

🚀 Features

Upload and process tabular classification datasets
Configure GAN model parameters through an intuitive UI
Generate synthetic data using CGAN or WGAN-GP models
Visual comparison between real and synthetic data distributions
Download synthetic datasets in CSV format
Interactive parameter selection with real-time updates

📋 Prerequisites

pandas
matplotlib
numpy
seaborn
streamlit
ydata-synthetic==1.4.0

🛠️ Installation

Clone this repository:

git clone https://github.com/rajeshai/ydata-synthetic-streamlit.git
cd ydata-synthetic-streamlit

Install the required packages:

pip install -r requirements.txt

💻 Usage

Run the Streamlit app:

streamlit run app.py

Open your web browser and navigate to the provided local URL (typically http://localhost:8501)
Follow these steps in the application:
- Upload your preprocessed tabular classification dataset (CSV format)
- Select the GAN model (CGAN or WGAN-GP)
- Choose numerical and categorical columns
- Configure model parameters:
  - Noise dimension
  - Layer dimension
  - Batch size
  - Sample interval
  - Number of epochs
  - Learning rate
  - Beta coefficients
- Specify the number of synthetic samples to generate
- Click the training button to start the process
- Download the generated synthetic dataset

🔧 Parameters

Parameter	Description	Default
Model	Choose between CGAN or WGAN-GP	CGAN
Noise Dimension	Input noise dimension for GAN	128
Layer Dimension	Dimension of network layers	128
Batch Size	Number of samples per training batch	500
Sample Interval	Interval for sampling during training	100
Epochs	Number of training epochs	2
Learning Rate	Model learning rate (x1e-3)	0.05
Beta Coefficients	Adam optimizer beta parameters	β1=0.5, β2=0.9

📊 Output

The application provides:

Visual comparisons between real and synthetic data distributions
Downloadable synthetic dataset in CSV format
Training progress indicators
Success notifications with celebratory balloons! 🎈

⚠️ Notes

For demonstration purposes, it's recommended to start with a small number of epochs
The application requires preprocessed data with no missing values
Training time depends on dataset size and selected parameters
The app uses caching to optimize performance

🙏 Acknowledgments

YData Synthetic Library for the synthetic data generation capabilities
Streamlit for the wonderful web application framework

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.streamlit		.streamlit
README.md		README.md
YData_logo.svg		YData_logo.svg
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Data Generator App

🚀 Features

📋 Prerequisites

🛠️ Installation

💻 Usage

🔧 Parameters

📊 Output

⚠️ Notes

🙏 Acknowledgments

About

Releases

Packages

Languages

rajeshai/ydata-synthetic-streamlit

Folders and files

Latest commit

History

Repository files navigation

Synthetic Data Generator App

🚀 Features

📋 Prerequisites

🛠️ Installation

💻 Usage

🔧 Parameters

📊 Output

⚠️ Notes

🙏 Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages