-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
164 lines (142 loc) · 8.07 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
title: "Harm From Severe Weather Events To Population Health And Economy In United States"
author: "JZstats"
date: "5/13/2020"
output: md_document
---
# About This Repository
This repository was created to support the project:
* [Harm From Severe Weather Events To Population Health And Economy In United States
](https://jzstats.github.io/Reproducible-Research--2nd-Assignment)
It contains all the material, relevant to the project:
1. **RepRes_analysis.Rmd**
* The main script of the project, that contains all the code and verbose
required to produce the:
1. **RepRes_analysis.md**
* The Markdown version of the report.
It is not advised to examine the analysis from this file,
as it couldn't present all the included code effectively
(extra chunks of html were used for various proposes).
In addition, due to the extensive length of the Report
it isn't very practical either.
2. **docs/Report.html**
* The version of the Report that was obtained
by rendering the *RepRes_analysis.Rmd* with the script
**render\_\_\_\_\_RepRes_analysis.R** to obtain
a *bookdown* version of the report,
powered by the *rmdformats* library.
A visually appealing, fully functional and practical version of the
Report that includes all the code used for the analysis
(folded by default).
2. **repdata_data_StormData.csv.bz2**
* The file with the compressed unprocessed data,
on which this analysis is based on,
which contains data from the *Storm Events Dataset*
gathered and made publicly available
by U.S. National Oceanic and Atmospheric Administration (NOAA).
3. **supplemental_information**
* Contains files with information on the *Storm Events Dataset*
that had major impact on the analysis:
* **supplemental_information/NATIONAL WEATHER SERVICE INSTRUCTION 10-1605 (AUGUST 17, 2007)**
* General information and directions for the *Storm Events Dataset*.
* **supplemental_information/NCDC Storm Events-FAQ Page.pdf**
* Frequently asked questions about the *Storm Events Dataset*.
* **supplemental_information/The-History-of-the-Storm-Events-Database.docx**
* Detailed information for the history of the *Storm Events Dataset*.
4. **outputs**
* contains the outputs that were produced (as a side effect)
during the execution of the script:
* **outputs/processed data**
* Contains the resulting table with the processed data,
that was obtained after the unprocessed data was loaded in R
and processed through the data processing pipeline:
* **outputs/processed data/table_with_the_processed_data.R**
* **outputs/harm_on_population_health**
* Contains the outputs with results for the harm on population health:
* **outputs/harm_on_population_health/figures**
* Contains the *Figure 1* with the overview of
harm on population health by each weather event type:
* **outputs/harm_on_population_health/figures/figure_1.png**
* **outputs/harm_on_population_health/results**
* Contains the files with the tables with the results
for the harm on population health by each weather event type
for each of the perspectives examined in the analysis:
* **outputs/harm_on_population_health/results/summary_____harm_on_population_health______fatalities.R**
* **outputs/harm_on_population_health/results/summary_____harm_on_population_health______injuries.R**
* **outputs/harm_on_population_health/results/summary_____harm_on_population_health______casualties.R**
* **outputs/harm_on_economy**
* Contains the outputs with results for the harm on economy:
* **outputs/harm_on_economy/figures**
* Contains the *Figure 2* with the overview of
harm on economy by each weather event type:
* **outputs/harm_on_economy/figures/figure_2.png**
* **outputs/harm_on_economy/results**
* Contains the files with the tables with the results
for the harm on economy by each weather event type
for each of the perspectives examined in the analysis:
* **outputs/harm_on_economy/results/summary_____harm_on_population_health______property_damage.R**
* **outputs/harm_on_economy/results/summary_____harm_on_population_health______crop_damage.R**
* **outputs/harm_on_economy/results/summary_____harm_on_population_health______economic_damage.R**
* **outputs/reproducibility_support**
* Contains information that can assist any attempt
to reproduce the analysis.
* **outputs/reproducibility_support/r_session**
* Information on the R session and the options that were
active when the original analysis took place:
* **outputs/reproducibility_support/r_session/session_info.R**
* **outputs/reproducibility_support/r_session/r_options.R**
* **outputs/reproducibility_support/MD5_checksum**
* Contains files with the MD5 checksums for the
unprocessed data, processed data, and the tables with the results
that were produced during the original execution of the script:
* **outputs/reproducibility_support/MD5_checksum/unprocessed_data_____MD5_checksum.txt**
* **outputs/reproducibility_support/MD5_checksum/processed_data_____MD5_checksum.txt**
* **outputs/reproducibility_support/MD5_checksum/results_____MD5_checksum.txt**
5. **docs**
* contains the files used to create and populate
the GitHub Page that was created to showcase this project:
* https://jzstats.github.io/Reproducible-Research--2nd-Assignment
<br>
# About The Assignment
This project was created for the 2nd peer-graded assignment of:
> Course 5: Reproducible Research,
> from Data Science Specialization,
> by Johns Hopkins University,
> at Coursera
The course is taught by:
* Jeff Leek, PhD
* Roger D. Peng, PhD
* Brian Caffo, PhD
As putted by the teachers of the course:
> The basic goal of this assignment is to explore the NOAA Storm Database
and answer some basic questions about severe weather events.
You must use the database to answer the questions below
and show the code for your entire analysis.
Your analysis can consist of tables, figures, or other summaries.
You may use any R package you want to support your analysis.
The assignment requests to address 2 questions:
> Your data analysis must address the following questions:
>
> **Question 1:** Across the United States, which types of events
(as indicated in the EVTYPE variable) are most harmful with respect to
population health?
>
> **Question 2:** Across the United States, which types of events have
the greatest economic consequences?
based on the observation from the [supplied dataset
](https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2):
> The data for this assignment come in the form of a comma-separated-value file
compressed via the bzip2 algorithm to reduce its size.
Some quite general guidelines and a tip were provided:
> Consider writing your report as if it were to be read by a government or
municipal manager who might be responsible for preparing for severe weather events
and will need to prioritize resources for different types of events.
However, there is no need to make any specific recommendations in your report.
> The events in the database start in the year 1950 and end in November 2011.
In the earlier years of the database there are generally fewer events recorded,
most likely due to a lack of good records. More recent years should be considered
more complete.
It was deliberately decided to adopt a more educational approach
aiming to produce a well-justified and self-explained product
that can serve as guide to a beginner on how a basic pipeline
can be constructed in order to obtain a report with an analysis from scratch.