More tutorials can be found here
Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.
This repository presents four wrangling projects on numeric and text format to turn unstructed data into insights.
You can look at the code and my write-up by clicking the link in the title of projects below.
- Wrangle a data set posted on U.S. Chronic Disease Indicators (CDI)
- Produce some summary statistics
- Visualize the correlation between binge drinking prevalence and poverty in U.S. States.
- Build pipeline on World Bank data set
- Explore the relationship between infant mortality and GPD per capita over time
- Group data by region/country to compare with the overall regression
- Select words with basic rules
- Select words with regular expression
- Basic frequenct summary
- Extract multiple formats of data from strings (noun, time, number, etc.)
- Manipulate natural language strings without preset tokenization.
- Visualize and generate insights from text analysis.