-
This assignment is due by 10pm on Monday 10/12/20. Please upload it using your personal GitHub repository for this class.
-
For this assignment, please reproduce this markdown file using R markdown.
-
Please name your R markdown file
assignment_3.Rmd
and the knitted markdown fileassignment_3.md
. -
Pay attention to all the formating in this file, including bullet points, bolded characters, inserted code chunks, headings, text colors, blank lines, and etc. You will need to reproduce all of these.
-
Have all your code embeded within the R markdown file, and show both of your code and plots in the knitted markdown file.
-
When a verbal response is needed, answer by replacing the parts that say “Write your response here” .
-
Use R Markdown functionalities to hide messages and warnings when needed. (Suggestion: messages and warnings can often be informative and important, so please examine them carefully and only turn them off when you finish the exercise).
-
You can start by making a copy of the R markdown template that you created as last week’s assignment and work from there.
-
First, load all the required packages with the following code. Install them if they are not installed yet.
library(tidyverse)
library(knitr)
This exercise explores a dataset containing the human development index
(HDI
) and corruption perception index (CPI
) of 173 countries across
6 different regions around the world: Americas, Asia Pacific, Eastern
Europe and Central Asia (East EU Cemt
), Western Europe (EU W. Europe
), Middle East and North Africa and Noth Africa (MENA
), and
Sub-Saharan Africa (SSA
). (Note: the larger CPI
is, the less
corrupted the country is perceived to be.)
First, we load the data using the following code.
economist_data <- read_csv("https://raw.githubusercontent.com/nt246/NTRES6940-data-science/master/datasets/EconomistData.csv")
X1 | Country | HDI.Rank | HDI | CPI | Region |
---|---|---|---|---|---|
1 | Afghanistan | 172 | 0.398 | 1.5 | Asia Pacific |
2 | Albania | 70 | 0.739 | 3.1 | East EU Cemt Asia |
3 | Algeria | 96 | 0.698 | 2.9 | MENA |
4 | Angola | 148 | 0.486 | 2.0 | SSA |
5 | Argentina | 45 | 0.797 | 3.0 | Americas |
6 | Armenia | 86 | 0.716 | 2.6 | East EU Cemt Asia |
1.2 Expore the relationship between human development index (HDI
) and corruption perception index (CPI
) with a scatter plot as the following.
1.4 Color the points in the previous plot according to the Region
variable, and set the size of points to 2.
1.11 Show the distribution of HDI
in each region using a box plot. Set the transparency of these boxes to 0.5 and do not show outlier points with the box plot. Instead, show all data points for each country in the same plot. (Hint: geom_jitter()
or position_jitter()
might be useful.)
1.13 You have now created a variety of different plots of the same dataset. Which of your plots do you think are the most informative? Describe briefly the major trends that you see in the data.
Answer: Write your response here.
This excercise uses the dataset economics
from the ggplot2 package. It
was produced from US economic time series data available from
http://research.stlouisfed.org/fred2. It descibes the number of
unemployed persons (unemploy
), among other variables, in the US from
1967 to 2015.
head(economics) %>% kable()
date | pce | pop | psavert | uempmed | unemploy |
---|---|---|---|---|---|
1967-07-01 | 506.7 | 198712 | 12.6 | 4.5 | 2944 |
1967-08-01 | 509.8 | 198911 | 12.6 | 4.7 | 2945 |
1967-09-01 | 515.6 | 199113 | 11.9 | 4.6 | 2958 |
1967-10-01 | 512.2 | 199311 | 12.9 | 4.9 | 3143 |
1967-11-01 | 517.4 | 199498 | 12.8 | 4.7 | 3066 |
1967-12-01 | 525.1 | 199657 | 11.8 | 4.8 | 3018 |