mainguyen2911 / Cleaning-Coursera-Dataset Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A short project focuses on Data Cleaning, using the raw dataset on Coursera's courses.

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CourseraDataset-Clean.csv		CourseraDataset-Clean.csv
Data Cleaning - CourseraDataset.ipynb		Data Cleaning - CourseraDataset.ipynb
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Cleaning-Coursera-Dataset

This project focuses on the cleaning process. The raw dataset can be found here at Kaggle.

This repository includes:

README.md
LICENSE
Data Cleaning - CourseraDataset.ipynb: the Jupiter Notebook file that includes all codes of the cleaning process.
CourseraDataset-Clean.csv: the cleaned dataset produced from the cleaning process.

During this cleaning process, I was able to:

Remove 900 duplicated values.
Convert text fields Duration and Review into numbers for further analysis.
Clean text fields that were displayed as lists initially.

About

A short project focuses on Data Cleaning, using the raw dataset on Coursera's courses.

projects data-cleaning data-cleaning-and-preprocessing

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%