Skip to content

Python3 projects utilising tools and techniques for data wrangling such as web scraping, data cleansing, data integration and text pre-processing tasks that prepare it for analytics.

Notifications You must be signed in to change notification settings

Agewerc/data-wrangling

Repository files navigation

Data Wrangling

Imgur

This documentation refers to 4 projects developed by me and Cristiana Gewerc for the Data Wrangling unit of Monash MDS. The main topics covered on those works were:

  • parse data in the required format;
  • assess the quality of data for problem identification;
  • resolve data quality issues ready for the data analysis process;
  • integrate data sources for data enrichment;
  • document the wrangling process for professional reporting;
  • write program scripts for data wrangling processes.

In a nutshell, the projects were developed in Jupyter Notebook python3 about:

  1. parsing-data

Extraction data from semi-structured text files using only re and pandas libraries. Gets a TXT file and generate a JSON and a CSV.

  1. text-preprocessing:

Extraction of a set of published papers from nonstructured format, preprocessing and convertion into numerical representations.

  1. cleansing-raw-data

Outliers analysis and removal, missing data imputation and data anomalies fix.

  1. data-integration-reshaping

Integrating multiple datasources, including web scraped data, XML files, Shapefiles, txt, GTFS data, csv and xlsx.

About

Python3 projects utilising tools and techniques for data wrangling such as web scraping, data cleansing, data integration and text pre-processing tasks that prepare it for analytics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published