Skip to content

A short project focuses on Data Cleaning, using the raw dataset on Coursera's courses.

License

Notifications You must be signed in to change notification settings

mainguyen2911/Cleaning-Coursera-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cleaning-Coursera-Dataset

This project focuses on the cleaning process. The raw dataset can be found here at Kaggle.

This repository includes:

  • README.md
  • LICENSE
  • Data Cleaning - CourseraDataset.ipynb: the Jupiter Notebook file that includes all codes of the cleaning process.
  • CourseraDataset-Clean.csv: the cleaned dataset produced from the cleaning process.

During this cleaning process, I was able to:

  • Remove 900 duplicated values.
  • Convert text fields Duration and Review into numbers for further analysis.
  • Clean text fields that were displayed as lists initially.

About

A short project focuses on Data Cleaning, using the raw dataset on Coursera's courses.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published