Skip to content

Latest commit

 

History

History
131 lines (89 loc) · 4.3 KB

README.md

File metadata and controls

131 lines (89 loc) · 4.3 KB

Automatic EDA Tools

Welcome to the Automatic EDA Tools repository! This repository contains examples and documentation for various tools that automate Exploratory Data Analysis (EDA) in Python. These tools help data scientists and analysts quickly understand their datasets and generate insightful reports with minimal effort.

Table of Contents

Introduction

Exploratory Data Analysis (EDA) is a crucial step in the data analysis process. It involves summarizing the main characteristics of a dataset, often using visual methods. Automating EDA can save time and provide a comprehensive overview of the data. This repository showcases four popular Python tools for automatic EDA: Pandas Profiling, AutoViz, Sweetviz, and D-Tale.

Tools

Pandas Profiling

Pandas Profiling generates profile reports from a pandas DataFrame. The report includes a variety of statistics and visualizations, such as missing values, correlations, and distributions.

  • Features:

    • Descriptive statistics for each column
    • Visualizations for distributions and correlations
    • Interactive HTML reports
  • Example:

    import pandas as pd
    from pandas_profiling import ProfileReport
    
    df = pd.read_csv('your_dataset.csv')
    profile = ProfileReport(df, title="Pandas Profiling Report")
    profile.to_file("output.html")

AutoViz

AutoViz automatically visualizes any dataset with a single line of code. It can handle both small and large datasets and provides a variety of plots to understand the data better.

  • Features:

    • Automatic visualization of data
    • Handles large datasets efficiently
    • Variety of plots including scatter, bar, and box plots
  • Example:

    from autoviz.AutoViz_Class import AutoViz_Class
    
    AV = AutoViz_Class()
    df = AV.AutoViz('your_dataset.csv')

Sweetviz

Sweetviz creates beautiful, high-density visualizations to help you understand your data quickly. It provides a detailed analysis of each feature and compares datasets.

  • Features:

    • High-density visualizations
    • Detailed feature analysis
    • Comparison of datasets
  • Example:

    import pandas as pd
    import sweetviz as sv
    
    df = pd.read_csv('your_dataset.csv')
    report = sv.analyze(df)
    report.show_html('sweetviz_report.html')

D-Tale

D-Tale is a powerful tool that combines the capabilities of a pandas DataFrame with an interactive web-based interface. It allows you to explore and analyze your data in a user-friendly environment.

  • Features:

    • Interactive web-based interface
    • Real-time data manipulation and visualization
    • Integration with pandas DataFrames
  • Example:

    import dtale
    import pandas as pd
    
    df = pd.read_csv('your_dataset.csv')
    d = dtale.show(df)
    d.open_browser()

Installation

To install these tools, you can use pip:

pip install pandas-profiling autoviz sweetviz dtale

Usage

Refer to the examples provided above for each tool to get started with automatic EDA. You can also check the official documentation for more detailed usage instructions.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue if you have any suggestions or improvements.

License

This repository is licensed under the MIT License. See the LICENSE file for more details.


Source: (1) 4 Ways to Automate Exploratory Data Analysis (EDA) in Python. https://builtin.com/data-science/EDA-python. (2) Streamlining Data Exploration: A Comparison of Profiling Tools for .... https://dataroots.io/blog/streamlining-data-exploration-a-comparison-of-pandas-profiler-sweet-viz-and-pandas-gui-for-effective-eda. (3) Tools to Automate EDA - GeeksforGeeks. https://www.geeksforgeeks.org/tools-to-automate-eda/. (4) Sweetviz: Automate Exploratory Data Analysis (EDA) - CoderzColumn. https://coderzcolumn.com/tutorials/data-science/sweetviz-automate-exploratory-data-analysis-eda. (5) Automated EDA with Python - Open Source Automation. https://theautomatic.net/2021/07/02/automated-eda-with-python/.