Skip to content

Yer1k/Bash-command-line-tool-to-clean-and-truncate-data

 
 

Repository files navigation

CI

Bash command-line tool to clean and truncate data

The objective of this project is to build a Bash command-line tool that performs a useful data preparation task such as cleaning data (especially handeling missing value rows), and truncating data based on specific criteria.

Motivation

The motivation of this project is to help reducing time on mannually subsetting data based on different year or other specific criteria.

For example, the sample dataset contains Life Expectancy related data from a period of 2000 to 2015 for all the countries. It would be fine to use Excel mannually filtering on country's status, either developing or developed, then save the filtered data file twice; but think about doing it more than 5 times? For instance, it would be tedious and time-wasting to use Excel mannually filtering each year, followed by "saving as" with typing each year 15 times.

Features

  • Dropping missing values
  • Split data based on country status (Developing & Developed)
  • Split data based on each year

Features

Flowchart

Flowchart

Dataset

The Global Health Observatory (GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countries The datasets are made available to public for the purpose of health data analysis. The dataset related to life expectancy, health factors for 193 countries has been collected from the same WHO data repository website and its corresponding economic data was collected from United Nation website.

https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 74.1%
  • Dockerfile 21.3%
  • Makefile 3.8%
  • Python 0.8%