This project focuses on the cleaning process. The raw dataset can be found here at Kaggle.
This repository includes:
- README.md
- LICENSE
- Data Cleaning - CourseraDataset.ipynb: the Jupiter Notebook file that includes all codes of the cleaning process.
- CourseraDataset-Clean.csv: the cleaned dataset produced from the cleaning process.
During this cleaning process, I was able to:
- Remove 900 duplicated values.
- Convert text fields Duration and Review into numbers for further analysis.
- Clean text fields that were displayed as lists initially.