Disclaimer: For informative purposes only. I am not a professional nor do I claim to be one. This is a school project that I extended upon in my spare time. I hold no responsibility for data accuracy or data gathering. Full credit goes to good people that made these datasets, links bellow.
That said, dataset created is free for you to use.
Dataset can be found in data folder
Goal of this project was to create a dataset containing facts about countries and preventive measures taken by those countries during COVID-19 pandemic, so far. For these purposes, I combined data from multiple sources to create a simple data warehouse that stores data on new measures implemented for each day in the period from 01-01-2020 to 02-06-2020, for every country. With this I hope to provide a single dataset for analysing the effectiveness of preventive measures worldwide.
Data for this project was combined from 4 sources:
-
ACAPS COVID-19: Government Measures Dataset by The Humanitarian Data Exchange - puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic.
-
COVID-19 Coronavirsu data by EU Open Data Portal - data on the geographic distribution of COVID-19 cases worldwide.
-
Oxford COVID-19 Government Response Tracker (OxCGRT) by Hale et. al. - systematically collects information on several different common policy responses that governments have taken to respond to the pandemic on 17 indicators such as school closures and travel restrictions.
-
The World by Income and Region by The World Bank - The World Bank classifies economies for analytical purposes into four income groups: low, lower-middle, upper-middle, and high income.
Image below shows ETL process. The transforming part was done with Python Pandas, code can be found in ETL.ipynb document in this repo. Main part of ETL was to clean the data and to split it into meaningful tables.
Data is modeled in the form of star schema consisting of 4 dimension tables and 1 fact table. Dimension tables contain: country, date and measures data. There is also one junk dimension (additional_dim) that contains additional data for each day. (Can be used in the future to get more insights)
Column name | Description |
---|---|
country_name | Name of the country. |
country_code | 3 letter code unique to every country. |
geo_id | Similar to country code. |
continent | Continent that country resides on. Similar (or same as) to region. |
region | Region of the world in which the country resides. |
population_data_2018 | Population number (data from 2018.). |
income_group_2020 | Income group of the country, as categorised by The World Bank (for the year 2020.). |
Column name | Description |
---|---|
measure | Name of the measure. Categorical value, there is 34 of them. |
measure_category | Each measure is categorised in one of six categories. |
Column name | Description |
---|---|
additional_id | Reference to additional_dim table. |
date_id | Reference to date_dim table. |
measure_id | Reference to measure_dim table. |
country_id | Reference to country_dim table. |
cases | New cases for each day for each county. |
deaths | Number of death for each day for each country. |
stringency_index | Index developed by OxCGRT, shows how strict government response measures are. For each day for each country. |
government_response_index | Index developed by OxCGRT, shows government responsivnes for each day for each country. |
containement_health_index | Another index by OxCGRT, more info on link |
economic_support_index | Another index by OxCGRT, more info on link |