Skip to content

gokcengiz/E-Commerce-Exploratory-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

E-Commerce Exploratory Data Analysis

project_image

This project is aimed at conducting exploratory data analysis on e-commerce data using Python. Our goal is to understand the dataset, discover interesting patterns, and extract valuable insights from the data. You can access the dataset we will use for this application by clicking on it.

This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

Each column in this data can be used to ask numerous questions like the following and obtain their answers.

• How many different products are there in an order?

• What are the best-selling products?

• Which countries have a higher number of orders and total order value?

• On which date was the highest spending recorded?

• Which customers have made the highest purchases on date X?

• Who are the customers with the lowest purchase amounts?

• From which countries does customer X make purchases?

Project Description

This project will be carried out using the Python programming language. Our main objectives are:

• Gain a general overview of the e-commerce dataset and its contents.

• Apply data cleaning and preprocessing steps, handle missing data, and address data inconsistencies.

• Visualize the data to identify interesting patterns, trends, and outliers.

• Perform basic statistical analyses to answer relevant questions.

• Understand customer profiles through segmentation and grouping analyses.

Technologies Used

Python: The core programming language for data analysis, manipulation, visualization, and statistical analysis.

Pandas: A powerful library for data manipulation and analysis.

NumPy: A fundamental package for numerical computations in Python.

Matplotlib and Seaborn: Libraries for data visualization.

PyCharm: An integrated development environment for Python.

Jupyter Notebook: Used for interactive coding and documenting analysis results.

Analysis Steps

1- Data Loading and Overview:

• Load the e-commerce dataset using Pandas.

• Display basic information about the dataset like column names, data types, and first few rows.

2- Data Cleaning and Preprocessing:

• Handle missing data by imputation or removal.

• Address data inconsistencies and anomalies.

• Convert data types if needed.

3- Data Visualization:

• Create histograms, box plots, and scatter plots to visualize distributions and relationships.

• Use bar plots and pie charts to show categorical data proportions.

• Identify outliers and potential anomalies through visual exploration.