Welcome to the Automatic EDA Tools repository! This repository contains examples and documentation for various tools that automate Exploratory Data Analysis (EDA) in Python. These tools help data scientists and analysts quickly understand their datasets and generate insightful reports with minimal effort.
Exploratory Data Analysis (EDA) is a crucial step in the data analysis process. It involves summarizing the main characteristics of a dataset, often using visual methods. Automating EDA can save time and provide a comprehensive overview of the data. This repository showcases four popular Python tools for automatic EDA: Pandas Profiling, AutoViz, Sweetviz, and D-Tale.
Pandas Profiling generates profile reports from a pandas DataFrame. The report includes a variety of statistics and visualizations, such as missing values, correlations, and distributions.
-
Features:
- Descriptive statistics for each column
- Visualizations for distributions and correlations
- Interactive HTML reports
-
Example:
import pandas as pd from pandas_profiling import ProfileReport df = pd.read_csv('your_dataset.csv') profile = ProfileReport(df, title="Pandas Profiling Report") profile.to_file("output.html")
AutoViz automatically visualizes any dataset with a single line of code. It can handle both small and large datasets and provides a variety of plots to understand the data better.
-
Features:
- Automatic visualization of data
- Handles large datasets efficiently
- Variety of plots including scatter, bar, and box plots
-
Example:
from autoviz.AutoViz_Class import AutoViz_Class AV = AutoViz_Class() df = AV.AutoViz('your_dataset.csv')
Sweetviz creates beautiful, high-density visualizations to help you understand your data quickly. It provides a detailed analysis of each feature and compares datasets.
-
Features:
- High-density visualizations
- Detailed feature analysis
- Comparison of datasets
-
Example:
import pandas as pd import sweetviz as sv df = pd.read_csv('your_dataset.csv') report = sv.analyze(df) report.show_html('sweetviz_report.html')
D-Tale is a powerful tool that combines the capabilities of a pandas DataFrame with an interactive web-based interface. It allows you to explore and analyze your data in a user-friendly environment.
-
Features:
- Interactive web-based interface
- Real-time data manipulation and visualization
- Integration with pandas DataFrames
-
Example:
import dtale import pandas as pd df = pd.read_csv('your_dataset.csv') d = dtale.show(df) d.open_browser()
To install these tools, you can use pip:
pip install pandas-profiling autoviz sweetviz dtale
Refer to the examples provided above for each tool to get started with automatic EDA. You can also check the official documentation for more detailed usage instructions.
Contributions are welcome! Please feel free to submit a Pull Request or open an Issue if you have any suggestions or improvements.
This repository is licensed under the MIT License. See the LICENSE file for more details.
Source: (1) 4 Ways to Automate Exploratory Data Analysis (EDA) in Python. https://builtin.com/data-science/EDA-python. (2) Streamlining Data Exploration: A Comparison of Profiling Tools for .... https://dataroots.io/blog/streamlining-data-exploration-a-comparison-of-pandas-profiler-sweet-viz-and-pandas-gui-for-effective-eda. (3) Tools to Automate EDA - GeeksforGeeks. https://www.geeksforgeeks.org/tools-to-automate-eda/. (4) Sweetviz: Automate Exploratory Data Analysis (EDA) - CoderzColumn. https://coderzcolumn.com/tutorials/data-science/sweetviz-automate-exploratory-data-analysis-eda. (5) Automated EDA with Python - Open Source Automation. https://theautomatic.net/2021/07/02/automated-eda-with-python/.