Data Wrangling

This documentation refers to 4 projects developed by me and Cristiana Gewerc for the Data Wrangling unit of Monash MDS. The main topics covered on those works were:

parse data in the required format;
assess the quality of data for problem identification;
resolve data quality issues ready for the data analysis process;
integrate data sources for data enrichment;
document the wrangling process for professional reporting;
write program scripts for data wrangling processes.

In a nutshell, the projects were developed in Jupyter Notebook python3 about:

parsing-data

Extraction data from semi-structured text files using only re and pandas libraries. Gets a TXT file and generate a JSON and a CSV.

text-preprocessing:

Extraction of a set of published papers from nonstructured format, preprocessing and convertion into numerical representations.

cleansing-raw-data

Outliers analysis and removal, missing data imputation and data anomalies fix.

data-integration-reshaping

Integrating multiple datasources, including web scraped data, XML files, Shapefiles, txt, GTFS data, csv and xlsx.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Wrangling

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
cleansing-raw-data		cleansing-raw-data
data-integration-reshaping		data-integration-reshaping
parsing-data		parsing-data
text-preprocessing		text-preprocessing
README.md		README.md

Agewerc/data-wrangling

Folders and files

Latest commit

History

Repository files navigation

Data Wrangling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages