Data-Cleaning-with-Pandas-and-Numpy

This project involves a detailed data cleaning process, documented in a Jupyter notebook. The Jupyter notebook contains detailed steps and methodologies used to transform the raw data into a more usable form.

Data Source

The dataset used in this project is the FIFA 21 Messy Raw Dataset gotten from Kaggle. It includes various player attributes and performance metrics from the popular video game FIFA 21.

Objectives

The main objectives of the data cleaning process include:

Handling missing values
Correcting data types
Data transformation for better analysis

Methodology

The Jupyter notebook details the following steps:

Initial Data Exploration:

Basic exploration to understand the structure and issues in the dataset.

Data Cleaning:

Data type corrections: Converting data into appropriate types.
- A function was defined to convert various string representations of measurements and values into their numerical equivalents.
- This function was used to handle conversions for units like centimeters ('cm'), feet and inches ('ft'in"), kilograms ('kg'), pounds ('lbs'), as well as monetary values in thousands ('K') and millions ('M'), and ratings indicated by stars ('★'). It's designed to parse and transform these diverse formats into a consistent float or integer format for easier data processing and analysis.
Data Transformation: Methods used for transforming data into a more analyzable format.
- The 'Joined' column, which is in the format 'Month day, year', was transformed into a pandas datetime object, handling any parsing errors.
- Three new columns: 'Joined_Year', 'Joined_Month', and 'Joined_Day' were created, extracting the respective year, month, and day from the converted datetime. This enhances the dataset by breaking down the joining date into separate, more analyzable components."
Handling missing values: Strategies used to fill or remove missing data.
- Drop the redundant columns and rows with missing values

Libraries Used

Pandas: For data manipulation and analysis.
NumPy: For numerical operations.

How to Use

Clone this repository.
Ensure you have Jupyter Notebook installed.
Open the notebook FIFA_21_Data_Cleaning.ipynb.
Install required libraries
Run the notebook to view the data cleaning process.

Contributions and Feedback

Feel free to fork this project, submit pull requests, or send suggestions to improve the data cleaning process.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Fifa Data Cleaning-checkpoint.ipynb		Fifa Data Cleaning-checkpoint.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Cleaning-with-Pandas-and-Numpy

Data Source

Objectives

Methodology

Libraries Used

How to Use

Contributions and Feedback

About

Releases

Packages

Languages

License

obielin/Data-Cleaning-with-Pandas-and-Numpy

Folders and files

Latest commit

History

Repository files navigation

Data-Cleaning-with-Pandas-and-Numpy

Data Source

Objectives

Methodology

Libraries Used

How to Use

Contributions and Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages