Skip to content

Exploratory Dataset Analysis (EDA) will be uploaded to this repository. Libraries such as Pandas, Matplotlib, Seaborn and Plotly will be used for data analysis.

Notifications You must be signed in to change notification settings

MelihGulum/Exploratory-Data-Analysis-EDA

Repository files navigation

Exploratory Data Analysis Projects

EDA projects will be uploaded to this repository. The projects will focus on issues such as data cleaning, visualization and exploration and they will mostly be done using Pandas, Seaborn, Matplotlib and Plotly libraries.

📖 Projects and Datasets

  1. Netflix Original

    • This dataset consists of all Netflix original films released as of June 1st, 2021. Additionally, it also includes all Netflix documentaries and specials. The data was webscraped off of this Wikipedia page, which was then integrated with a dataset consisting of all of their corresponding IMDB scores. IMDB scores are voted on by community members, and the majority of the films have 1,000+ reviews.
    • The dataset available on Kaggle.
  2. MovieLens

    • The MovieLens dataset for Exploratory Data Analysis. The dataset available on MovieLens. It contains 27753444 ratings and 1108997 tag applications across 58098 movies. These data were created by 283228 users between January 09, 1995 and September 26, 2018. This dataset was generated on September 26, 2018.
  3. Titanic

  4. Data Science Salary

    • The dataset contains salaries of different Data Science fields in the Data Science Domain and available on Kaggle
  5. Heart Attack

    • This dataset contains information about people and there chances of having a heart stroke.
  6. Glassdoor

    • The dataset was created from data science job postings on Glassdoor using Selenium.
  7. Pokemon

    • In this project, pokemon data which taken from Kaggle was examined. This notebook will cover EDA and Data Preprocessing steps.
  8. Automated EDA Tools

    • This project benchmarks three popular Automated Exploratory Data Analysis (EDA) tools—Sweetviz, Pandas Profiling, and AutoViz—across three datasets of varying size and complexity: the Titanic dataset, House Prices - Advanced Regression Techniques, and the New York City Taxi Trip Duration dataset. By comparing metrics such as runtime, memory usage, and CPU performance, the goal is to evaluate each tool's efficiency and scalability. The project aims to provide insights into the strengths and weaknesses of each EDA tool, helping data practitioners choose the most suitable tool based on the nature of their dataset and analysis requirements.

About

Exploratory Dataset Analysis (EDA) will be uploaded to this repository. Libraries such as Pandas, Matplotlib, Seaborn and Plotly will be used for data analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published