-
Notifications
You must be signed in to change notification settings - Fork 0
/
RepRes_analysis.Rmd
14370 lines (12052 loc) · 582 KB
/
RepRes_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Harm From Severe Weather Events To Population Health And Economy In United States"
author: "JZstats"
date: "4/26/2020"
output: md_document
always_allow_html: true
---
<font size="2">[read the SYNOPSIS](#ind-3--SYNOPSIS)</font>
<br>
<br>
***
# 1 TABLE OF CONTENTS {#ind-1--TABLE-OF-CONTENTS}
***
* [1 TABLE OF CONTENTS](#ind-1--TABLE-OF-CONTENTS)
* [2 PROLOGUE](#ind-2--PROLOGUE)
* [2.1 About The Assignment](#ind-2-1--About-The-Assignment)
* [2.2 About The Main Script](#ind-2-2--About-The-Main-Script)
* [2.3 About The Report](#ind-2-3--About-The-Report)
* [3 SYNOPSIS](#ind-3--SYNOPSIS)
* [4 STORM EVENTS DATASET](#ind-4--STORM-EVENTS-DATASET)
* [4.1 General Informations](#ind-4-1--General-Informations)
* [4.2 Points Of Interest](#ind-4-2--Points-Of-Interest)
* [4.2.1 Changes in the composition of weather event types](#ind-4-2-1--Changes-in-the-composition-of-weather-event-types)
* [4.2.2 Eligibility criteria for inclusion of weather events in the dataset](#ind-4-2-2--Eligibility-criteria-for-inclusion-of-weather-events-in-the-dataset)
* [5 PRELIMINARY ACTIVITIES](#ind-5--PRELIMINARY-ACTIVITIES)
* [5.1 Set The Random Seed](#ind-5-1--Set-The-Random-Seed)
* [5.2 Load All Required Libraries](#ind-5-2--Load-All-Required-Libraries)
* [5.3 Create All Required Directories](#ind-5-3--Create-All-Required-Directories)
* [5.4 Access The File With The Raw Data](#ind-5-4--Access-The-File-With-The-Raw-Data)
* [6 DATA PROCESSING](#ind-6--DATA-PROCESSING)
* [6.1 Load The Raw Data In R](#ind-6-1--Load-The-Raw-Data-In_R)
* [6.1.1 Create the table with the raw data](#ind-6-1-1--Create-the-table-with-the-raw-data)
* [6.1.2 Conduct post validation for the table with the raw data](#ind-6-1-2--Conduct-post-validation-for-the-table-with-the-raw-data)
* [6.1.3 Overview of the table with the raw data](#ind-6-1-3--Overview-of-the-table-with-the-raw-data)
* [6.2 Preprocess The Raw Data](#ind-6-2--Preprocess-The-Raw-Data)
* [6.2.1 Verify the prerequisites for the selected variables](#ind-6-2-1--Verify-the-prerequisites-for-the-selected-variables)
* [6.2.1.1 Verify the coercibility of the values for the selected variables](#ind-6-2-1-1--Verify-the-coercibility-for-the-values-at-the-selected-variables)
* [6.2.1.2 Verify the uniqueness of the key values](#ind-6-2-1-2--Verify-the-uniqueness-of-the-key-values)
* [6.2.2 Create the table with the preprocessed data](#ind-6-2-2--Create-the-table-with-the-preprocessed-data)
* [6.2.3 Conduct post validation for the table with the preprocessed data](#ind-6-2-3--Conduct-post-validation-for-the-table-with-the-preprocessed-data)
* [6.2.4 Overview of the table with the preprocessed data](#ind-6-2-4--Overview-of-the-table-with-the-preprocessed-data)
* [6.3 Extract The Target Data Subset](#ind-6-3--Extract-The-Target-Data-Subset)
* [6.3.1 Identify the target subset of observations](#ind-6-3-1--Identify-the-target-subset-of-observations)
* [6.3.1.1 Verify the consistency of date format](#ind-6-3-1-1--Verify-the-consistency-of-date-format)
* [6.3.1.2 Identify the eligible observations](#ind-6-3-1-2--Identify-the-eligible-observations)
* [6.3.2 Create the table with the target data subset](#ind-6-3-2--Create-the-table-with-the-target-data-subset)
* [6.3.3 Conduct post validation for the table with the target data subset](#ind-6-3-3--Conduct-post-validation-for-the-table-with-the-target-data-subset)
* [6.3.4 Overview of the table with the target data subset](#ind-6-3-4--Overview-of-the-table-with-the-target-data-subset)
* [6.4 Conduct In-Record Data Validation](#ind-6-4--Conduct-In-Record-Data-Validation)
* [6.4.1 Introduce information from the Strom Data Documentation](#ind-6-4-1--Introduce-information-from-the-Strom-Data-Documentation)
* [6.4.1.1 Valid values for the EVTYPE variable](#ind-6-4-1-1--Valid-values-for-the-EVTYPE-variable)
* [6.4.1.2 Valid values for the PROPDMGEXP variable](#ind-6-4-1-2--Valid-values-for-the-PROPDMGEX-variable)
* [6.4.1.3 Valid values for the CROPDMGEXP variable](#ind-6-4-1-3--Valid-values-for-the-CROPDMGEX-variable)
* [6.4.2 Conduct in-record data validation for each variable](#ind-6-4-2--Conduct-in-record-data-validation-for-each-variable)
* [6.4.3 Create the table with the in-record validated data](#ind-6-4-3--Create-the-table-with-the-in-record-validated-data)
* [6.4.4 Conduct post validation for the table with the in-record validated data](#ind-6-4-4--Conduct-post-validation-for-the-table-with-the-in-record-validated-data)
* [6.4.5 Overview of the table with the in-record validated data](#ind-6-4-5--Overview-of-the-table-with-the-in-record-validated-data)
* [6.5 Impute Missing Values](#ind-6-5--Impute-Missing-Values)
* [6.5.1 Impute missing values at the variable EVTYPE](#ind-6-5-1--Impute-missing-values-at-the-variable-EVTYPE)
* [6.5.1.1 Examine the invalid values from the variable EVTYPE](#ind-6-5-1-1--Examine-the-invalid-values-from-the-variable-EVTYPE)
* [6.5.1.2 Associate plausible substitutions to the invalid values from the variable EVTYPE](#ind-6-5-1-2--Associate-plausible-substitutions-to-the-invalid-values-from-the-variable-EVTYPE)
* [6.5.1.3 Identify the imputable missing values at the variable EVTYPE](#ind-6-5-1-3--Identify-the-imputable-missing-values-at-the-variable-EVTYPE)
* [6.5.1.4 Substitute the imputable missing values at the variable EVTYPE](#ind-6-5-1-4--Substitute-the-imputable-missing-values-at-the-variable-EVTYPE)
* [6.5.2 Impute missing values at the variable PROPDMGEXP](#ind-6-5-2--Impute-missing-values-at-the-variable-PROPDMGEXP)
* [6.5.2.1 Examine the invalid values from the variable PROPDMGEXP](#ind-6-5-2-1--Examine-the-invalid-values-from-the-variable-PROPDMGEXP)
* [6.5.2.2 Associate plausible substitutions to the invalid values from the variable PROPDMGEXP](#ind-6-5-2-2--Associate-plausible-substitutions-to-the-invalid-values-from-the-variable-PROPDMGEXP)
* [6.5.2.3 Identify the imputable missing values at the variable PROPDMGEXP](#ind-6-5-2-3--Identify-the-imputable-missing-values-at-the-variable-PROPDMGEXP)
* [6.5.2.4 Substitute the imputable missing values at the variable PROPDMGEXP](#ind-6-5-2-4--Substitute-the-imputable-missing-values-at-the-variable-PROPDMGEXP)
* [6.5.3 Impute missing values at the variable CROPDMGEXP](#ind-6-5-3--Impute-missing-values-at-the-variable-CROPDMGEXP)
* [6.5.3.1 Examine the invalid values from the variable CROPDMGEXP](#ind-6-5-3-1--Examine-the-invalid-values-from-the-variable-CROPDMGEXP)
* [6.5.3.2 Associate plausible substitutions to the invalid values from the variable CROPDMGEXP](#ind-6-5-3-2--Associate-plausible-substitutions-to-the-invalid-values-from-the-variable-CROPDMGEXP)
* [6.5.3.3 Identify the imputable missing values at the variable CROPDMGEXP](#ind-6-5-3-3--Identify-the-imputable-missing-values-at-the-variable-CROPDMGEXP)
* [6.5.3.4 Substitute the imputable missing values at the variable CROPDMGEXP](#ind-6-5-3-4--Substitute-the-imputable-missing-values-at-the-variable-CROPDMGEXP)
* [6.5.4 Conduct post validation for the table with the imputed data](#ind-6-5-4--Conduct-post-validation-for-the-table-with-the-imputed-data)
* [6.5.5 Overview of the table with the imputed data](#ind-6-5-5--Overview-of-the-table-with-the-imputed-data)
* [6.6 Conduct Cross-Record Data Validation](#ind-6-6--Conduct-Cross-Record-Data-Validation)
* [6.6.1 Identify all valid observations](#ind-6-6-1--Identify-all-valid-observations)
* [6.6.2 Create the table with the cross-record validated data](#ind-6-6-2--Create-the-table-with-the-cross-record-validated-data)
* [6.6.3 Conduct post validation for table with the cross-record validated data](#ind-6-6-3--Conduct-post-validation-for-table-with-the-cross-record-validated-data)
* [6.6.4 Overview of the table with the cross-record validated data](#ind-6-6-4--Overview-of-the-table-with-the-cross-record-validated-data)
* [6.7 Produce The Processed Data](#ind-6-7--Produce-The-Processed-Data)
* [6.7.1 Create the table with the processed data](#ind-6-7-1--Create-the-table-with-the-processed-data)
* [6.7.2 Conduct post validation for the table with the processed data](#ind-6-7-2--Conduct-post-validation-for-the-table-with-the-processed-data)
* [7 PROCESSED DATA](#ind-7--PROCESSED-DATA)
* [7.1 Information For The Table With The Processed Data](#ind-7-1--Information-For-The-Table-With-The-Processed-Data)
* [7.2 Overview Of The Table With The Processed Data](#ind-7-2--Overview-Of-The-Table-With-The-Processed-Data)
* [7.3 Export The Table With The Processed Data](#ind-7-3--Export-The-Table-With-The-Processed-Data)
* [8 HARM ON POPULATION HEALTH](#ind-8--HARM-ON-POPULATION-HEALTH)
* [8.1 Harm On Population Health With Respect To Fatalities By Each Weather Event Type](#ind-8-1--Harm-On-Population-Health-With-Respect-To-Fatalities-By-Each-Weather-Event-Type)
* [8.1.1 Extract the target data for harm on population health with respect to fatalities](#ind-8-1-1--Extract-the-target-data-for-harm-on-population-health-with-respect-to-fatalities)
* [8.1.2 Process the target data for harm on population health with respect to fatalities](#ind-8-1-2--Process-the-target-data-for-harm-on-population-health-with-respect-to-fatalities)
* [8.1.3 Summarize the processed data for harm on population health with respect to fatalities by each weather event type](#ind-8-1-3--Summarize-the-processed-data-for-harm-on-population-health-with-respect-to-fatalities-by-each-weather-event-type)
* [8.1.4 Visualize the results of the summary for the harm on population health with respect to fatalities by each weather event type](#ind-8-1-4--Visualize-the-results-of-the-summary-for-the-harm-on-population-health-with-respect-to-fatalities-by-each-weather-event-type)
* [8.1.4.1 Create the components of Multiplot 1.1](#ind-8-1-4-1--Create-the-components-of-Multiplot-1-1)
* [8.1.4.1.1 Create The Plot 1.1.1](#ind-8-1-4-1-1--Create-The-Plot-1-1-1)
* [8.1.4.1.2 Create The Plot 1.1.2](#ind-8-1-4-1-2--Create-The-Plot-1-1-2)
* [8.1.4.1.3 Create The Plot 1.1.3](#ind-8-1-4-1-3--Create-The-Plot-1-1-3)
* [8.1.4.1.4 Create The Plot 1.1.4](#ind-8-1-4-1-4--Create-The-Plot-1-1-4)
* [8.1.4.2 Compose the Multiplot 1.1](#ind-8-1-4-2--Compose-the-Multiplot-1-1)
* [8.2 Harm On Population Health With Respect To Injuries By Each Weather Event Type](#ind-8-2--Harm-On-Population-Health-With-Respect-To-Injuries-By-Each-Weather-Event-Type)
* [8.2.1 Extract the target data for harm on population health with respect to injuries](#ind-8-2-1--Extract-the-target-data-for-harm-on-population-health-with-respect-to-injuries)
* [8.2.2 Process the target data for harm on population health with respect to injuries](#ind-8-2-2--Process-the-target-data-for-harm-on-population-health-with-respect-to-injuries)
* [8.2.3 Summarize the processed data for harm on population health with respect to injuries by each weather event type](#ind-8-2-3--Summarize-the-processed-data-for-harm-on-population-health-with-respect-to-injuries-by-each-weather-event-type)
* [8.2.4 Visualize the results of the summary for the harm on population health with respect to injuries by each weather event type](#ind-8-2-4--Visualize-the-results-of-the-summary-for-the-harm-on-population-health-with-respect-to-injuries-by-each-weather-event-type)
* [8.2.4.1 Create the components of Multiplot 1.2](#ind-8-2-4-1--Create-the-components-of-Multiplot-1-2)
* [8.2.4.1.1 Create The Plot 1.2.1](#ind-8-2-4-1-1--Create-The-Plot-1-2-1)
* [8.2.4.1.2 Create The Plot 1.2.2](#ind-8-2-4-1-2--Create-The-Plot-1-2-2)
* [8.2.4.1.3 Create The Plot 1.2.3](#ind-8-2-4-1-3--Create-The-Plot-1-2-3)
* [8.2.4.1.4 Create The Plot 1.2.4](#ind-8-2-4-1-4--Create-The-Plot-1-2-4)
* [8.2.4.2 Compose the Multiplot 1.2](#ind-8-2-4-2--Compose-the-Multiplot-1-2)
* [8.3 Harm On Population Health With Respect To Casualties By Each Weather Event Type](#ind-8-3--Harm-On-Population-Health-With-Respect-To-Casualties-By-Each-Weather-Event-Type)
* [8.3.1 Extract the target data for harm on population health with respect to casualties](#ind-8-3-1--Extract-the-target-data-for-harm-on-population-health-with-respect-to-casualties)
* [8.3.2 Process the target data for harm on population health with respect to casualties](#ind-8-3-2--Process-the-target-data-for-harm-on-population-health-with-respect-to-casualties)
* [8.3.3 Summarize the processed data for harm on population health with respect to casualties by each weather event type](#ind-8-3-3--Summarize-the-processed-data-for-harm-on-population-health-with-respect-to-casualties-by-each-weather-event-type)
* [8.3.4 Visualize the results of the summary for the harm on population health with respect to casualties by each weather event type](#ind-8-3-4--Visualize-the-results-of-the-summary-for-the-harm-on-population-health-with-respect-to-casualties-by-each-weather-event-type)
* [8.3.4.1 Create the components of Multiplot 1.3](#ind-8-3-4-1--Create-the-components-of-Multiplot-1-3)
* [8.3.4.1.1 Create The Plot 1.3.1](#ind-8-3-4-1-1--Create-The-Plot-1-3-1)
* [8.3.4.1.2 Create The Plot 1.3.2](#ind-8-3-4-1-2--Create-The-Plot-1-3-2)
* [8.3.4.1.3 Create The Plot 1.3.3](#ind-8-3-4-1-3--Create-The-Plot-1-3-3)
* [8.3.4.1.4 Create The Plot 1.3.4](#ind-8-3-4-1-4--Create-The-Plot-1-3-4)
* [8.3.4.2 Compose the Multiplot 1.3](#ind-8-3-4-2--Compose-the-Multiplot-1-3)
* [9 HARM ON ECONOMY](#ind-9--HARM-ON-ECONOMY)
* [9.1 Harm On Economy With Respect To Property Damage By Each Weather Event Type](#ind-9-1--Harm-On-Economy-With-Respect-To-Property-Damage-By-Each-Weather-Event-Type)
* [9.1.1 Extract the target data for harm on economy with respect to property damage](#ind-9-1-1--Extract-the-target-data-for-harm-on-economy-with-respect-to-property-damage)
* [9.1.2 Process the target data for harm on economy with respect to property damage](#ind-9-1-2--Process-the-target-data-for-harm-on-economy-with-respect-to-property-damage)
* [9.1.3 Summarize the processed data for harm on economy with respect to property damage by each weather event type](#ind-9-1-3--Summarize-the-processed-data-for-harm-on-economy-with-respect-to-property-damage-by-each-weather-event-type)
* [9.1.4 Visualize the results of the summary for the harm on economy with respect to property damage by each weather event type](#ind-9-1-4--Visualize-the-results-of-the-summary-for-the-harm-on-economy-with-respect-to-property-damage-by-each-weather-event-type)
* [9.1.4.1 Create the components of Multiplot 2.1](#ind-9-1-4-1--Create-the-components-of-Multiplot-2-1)
* [9.1.4.1.1 Create The Plot 2.1.1](#ind-9-1-4-1-1--Create-The-Plot-2-1-1)
* [9.1.4.1.2 Create The Plot 2.1.2](#ind-9-1-4-1-2--Create-The-Plot-2-1-2)
* [9.1.4.1.3 Create The Plot 2.1.3](#ind-9-1-4-1-3--Create-The-Plot-2-1-3)
* [9.1.4.1.4 Create The Plot 2.1.4](#ind-9-1-4-1-4--Create-The-Plot-2-1-4)
* [9.1.4.2 Compose the Multiplot 2.1](#ind-9-1-4-2--Compose-the-Multiplot-2-1)
* [9.2 Harm On Economy With Respect To Crop Damage By Each Weather Event Type](#ind-9-2--Harm-On-Economy-With-Respect-To-Crop-Damage-By-Each-Weather-Event-Type)
* [9.2.1 Extract the target data for harm on economy with respect to crop damage](#ind-9-2-1--Extract-the-target-data-for-harm-on-economy-with-respect-to-crop-damage)
* [9.2.2 Process the target data for harm on economy with respect to crop damage](#ind-9-2-2--Process-the-target-data-for-harm-on-economy-with-respect-to-crop-damage)
* [9.2.3 Summarize the processed data for harm on economy with respect to crop damage by each weather event type](#ind-9-2-3--Summarize-the-processed-data-for-harm-on-economy-with-respect-to-crop-damage-by-each-weather-event-type)
* [9.2.4 Visualize the results of the summary for the harm on economy with respect to crop damage by each weather event type](#ind-9-2-4--Visualize-the-results-of-the-summary-for-the-harm-on-economy-with-respect-to-crop-damage-by-each-weather-event-type)
* [9.2.4.1 Create the components of Multiplot 2.2](#ind-9-2-4-1--Create-the-components-of-Multiplot-2-2)
* [9.2.4.1.1 Create The Plot 2.2.1](#ind-9-2-4-1-1--Create-The-Plot-2-2-1)
* [9.2.4.1.2 Create The Plot 2.2.2](#ind-9-2-4-1-2--Create-The-Plot-2-2-2)
* [9.2.4.1.3 Create The Plot 2.2.3](#ind-9-2-4-1-3--Create-The-Plot-2-2-3)
* [9.2.4.1.4 Create The Plot 2.2.4](#ind-9-2-4-1-4--Create-The-Plot-2-2-4)
* [9.2.4.2 Compose the Multiplot 2.2](#ind-9-2-4-2--Compose-the-Multiplot-2-2)
* [9.3 Harm On Economy With Respect To Economic Damage By Each Weather Event Type](#ind-9-3--Harm-On-Economy-With-Respect-To-Economic-Damage-By-Each-Weather-Event-Type)
* [9.3.1 Extract the target data for harm on economy with respect to economic damage](#ind-9-3-1--Extract-the-target-data-for-harm-on-economy-with-respect-to-economic-damage)
* [9.3.2 Process the target data for harm on economy with respect to economic damage](#ind-9-3-2--Process-the-target-data-for-harm-on-economy-with-respect-to-economic-damage)
* [9.3.3 Summarize the processed data for harm on economy with respect to economic damage by each weather event type](#ind-9-3-3--Summarize-the-processed-data-for-harm-on-economy-with-respect-to-economic-damage-by-each-weather-event-type)
* [9.3.4 Visualize the results of the summary for the harm on economy with respect to economic damage by each weather event type](#ind-9-3-4--Visualize-the-results-of-the-summary-for-the-harm-on-economy-with-respect-to-economic-damage-by-each-weather-event-type)
* [9.3.4.1 Create the components of Multiplot 2.3](#ind-9-3-4-1--Create-the-components-of-Multiplot-2-3)
* [9.3.4.1.1 Create The Plot 2.3.1](#ind-9-3-4-1-1--Create-The-Plot-2-3-1)
* [9.3.4.1.2 Create The Plot 2.3.2](#ind-9-3-4-1-2--Create-The-Plot-2-3-2)
* [9.3.4.1.3 Create The Plot 2.3.3](#ind-9-3-4-1-3--Create-The-Plot-2-3-3)
* [9.3.4.1.4 Create The Plot 2.3.4](#ind-9-3-4-1-4--Create-The-Plot-2-3-4)
* [9.3.4.2 Compose the Multiplot 2.3](#ind-9-3-4-2--Compose-the-Multiplot-2-3)
* [10 RESULTS](#ind-10--RESULTS)
* [10.1 Question 1: Across the United States, which types of events
(as indicated in the EVTYPE variable) are most harmful
with respect to population health?](#ind-10-1--results-harm-on-population-health)
* [10.1.1 Overview of results for the harm on population health](#ind-10-1-1--Overview-of-results-for-the-harm-on-population-health)
* [10.1.2 Most harmful event types with respect to fatalities](#ind-10-1-2--Most-harmful-weather-event-types-with-respect-to-fatalities)
* [10.1.3 Most harmful event types with respect to injuries](#ind-10-1-3--Most-harmful-weather-event-types-with-respect-to-injuries)
* [10.1.4 Most harmful event types with respect to casualties](#ind-10-1-4--Most-harmful-weather-event-types-with-respect-to-casualties)
* [10.2 Question 2 : Across the United States, which types of events
have the greatest economic consequences?](#ind-10-2--results-harm-on-economy)
* [10.2.1 Overview of results for the harm on economy](#ind-10-2-1--Overview-of-results-for-the-harm-on-economy)
* [10.2.2 Most harmful event types with respect to property damage](#ind-10-2-2--Most-harmful-weather-event-types-with-respect-to-property-damage)
* [10.2.3 Most harmful event types with respect to crop damage](#ind-10-2-3--Most-harmful-weather-event-types-with-respect-to-crop-damage)
* [10.2.4 Most harmful event types with respect to economic damage](#ind-10-2-4--Most-harmful-weather-event-types-with-respect-to-economic-damage)
* [11 REPRODUCIBILITY DETAILS](#ind-11--REPRODUCIBILITY-DETAILS)
* [11.1 Session Info](#ind-11-1--Session-Info)
* [11.2 Options](#ind-11-2--Options)
* [11.3 MD5 Checksums](#ind-11-3--MD5-Checksums)
* [11.3.1 Create a utility function to export MD5 checksums](#ind-11-3-1--Create-a-utility-function-to-export-MD5-checksums)
* [11.3.2 MD5 checksum of the input file with the unprocessed data](#ind-11-3-2--MD5-checksum-of-the-input-file-with-the-unprocessed-data)
* [11.3.3 MD5 checksum of the output file with the processed data](#ind-11-3-3--MD5-checksum-for-the-output-file-with-the-processed-data)
* [11.3.4 MD5 checksum of the output files with the results](#ind-11-3-4--MD5-checksum-of-the-output-files-with-the-results)
* [11.4 Random Seed](#ind-11-4--Random-Seed)
* [12 LICENSE](#ind-12--LICENSE)
* [13 REFERENCES](#ind-13--REFERENCES)
<br>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
<br>
<br>
<br>
***
# 2 PROLOGUE {#ind-2--PROLOGUE}
***
To provide some context for the reader
with respect to *what this is all about*,
some general information was included:
* [2.1 About The Assignment](#ind-2-1--About-The-Assignment)
* [2.2 About The Main Script](#ind-2-2--About-The-Main-Script)
* [2.3 About The Report](#ind-2-3--About-The-Report)
A summary for the analysis was not included in this chapter,
but can be found at the chapter [SYNOPSIS](#ind-3--SYNOPSIS).
<br>
<font size="1">[back to start of this chapter](#ind-2--PROLOGUE)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
## 2.1 About The Assignment {#ind-2-1--About-The-Assignment}
This project was created for the 2nd peer-graded assignment of:
> Course 5: Reproducible Research,
> from Data Science Specialization,
> by Johns Hopkins University,
> at Coursera
The course is taught by:
* Jeff Leek, PhD
* Roger D. Peng, PhD
* Brian Caffo, PhD
As putted by the teachers of the course:
> The basic goal of this assignment is to explore the NOAA Storm Database
and answer some basic questions about severe weather events.
You must use the database to answer the questions below
and show the code for your entire analysis.
Your analysis can consist of tables, figures, or other summaries.
You may use any R package you want to support your analysis.
The assignment requests to address 2 questions:
> Your data analysis must address the following questions:
>
> **Question 1:** Across the United States, which types of events
(as indicated in the EVTYPE variable) are most harmful with respect to
population health?
>
> **Question 2:** Across the United States, which types of events have
the greatest economic consequences?
based on the observation from the [supplied dataset
](https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2):
> The data for this assignment come in the form of a comma-separated-value file
compressed via the bzip2 algorithm to reduce its size.
Some quite general guidelines and a tip were provided:
> Consider writing your report as if it were to be read by a government or
municipal manager who might be responsible for preparing for severe weather events
and will need to prioritize resources for different types of events.
However, there is no need to make any specific recommendations in your report.
> The events in the database start in the year 1950 and end in November 2011.
In the earlier years of the database there are generally fewer events recorded,
most likely due to a lack of good records. More recent years should be considered
more complete.
It was deliberately decided to adopt a more educational approach
aiming to produce a well-justified and self-explained product
that can serve as guide to a beginner on how a basic pipeline
can be constructed in order to obtain a report with an analysis from scratch.
**All the requirements for the assignment were followed, with one exception:**
* __due to the book-like structure
that was adopted for the report
it was considered more appropriate
to include the [SYNOPSIS](#ind-3--SYNOPSIS)
not immediately after the title,
but as a separate chapter after the
[PROLOGUE](#ind-2--PROLOGUE)__
<br>
<font size="1">[back to start of this section](#ind-2-1--About-The-Assignment)</font>
<font size="1">[back to start of this chapter](#ind-2--PROLOGUE)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
## 2.2 About The Main Script {#ind-2-2--About-The-Main-Script}
In the github repository
https://github.com/jzstats/Reproducible-Research--2nd-Assignment,
that hosts all the material relevant to this project
the main script [RepRes_____analysis.Rmd
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/RepRes_analysis.Rmd)
that contains the code used to conduct the analysis can be found.
When *knitted* directly from RStudio, it produces the Markdown file
[RepRes_____analysis.md
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/RepRes_analysis.md)
with the analysis.
In addition, it was rendered with the script [render_____RepRes_analysis.R
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/render_____RepRes_analysis.R),
(as explained at the following section of this chapter,
[2.3 About The Report](#ind-2-3--About-The-Report))
to produce a *bookdown* variation
that was uploaded to Rpubs and used to populate the
[webpage](https://jzstats.github.io/Reproducible-Research--2nd-Assignment/)
that was created to showcase this project.
<br>
<font size="1">[back to start of this section](#ind-2-2--About-The-Main-Script)</font>
<font size="1">[back to start of this chapter](#ind-2--PROLOGUE)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
## 2.3 About The Report {#ind-2-3--About-The-Report}
The main Rmd file, [RepRes_analysis.Rmd
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/RepRes_analysis.Rmd)
that contains the code to conduct the analysis
and produces the Markdown document [RepRes_analysis.md
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/RepRes_analysis.md)
was rendered with the script [render\_\_\_\_\_RepRes_analysis.R
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/render\_\_\_\_\_RepRes_analysis.R)
to create a *bookdown* version of the report with the analysis,
that are hosted at the [webpage
](https://jzstats.github.io/Reproducible-Research--2nd-Assignment/)
created to showcase the this project:
* [Report
](https://jzstats.github.io/Reproducible-Research--2nd-Assignment/Report.html)
* A more visually appealing and practical
(due to the sidepanel with contents that contains)
book-like version of the report powered by the [rmdformats
](https://cran.rstudio.com/web/packages/rmdformats/index.html) library.
It was produced by rendering the [RepRes_analysis.Rmd
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/RepRes_analysis.Rmd)
with the script [render\_\_\_\_\_RepRes_analysis.R
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/render\_\_\_\_\_RepRes_analysis.R) .
This is the version that was uploaded to RPubs at this [link
](https://rpubs.com/JZstats/Reproducible-Research--2nd-Assignment).
<br>
<font size="1">[back to start of this section](#ind-2-3--About-The-Report)</font>
<font size="1">[back to start of this chapter](#ind-2--PROLOGUE)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
<br>
<br>
<br>
***
# 3 SYNOPSIS {#ind-3--SYNOPSIS}
***
The U.S. National Oceanic and Atmospheric Administration's (NOAA)
*Storm Events Database*, was explored to identify
the most harmful weather event types,
among the weather phenomena defined in
_NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007 (*at chapter 7*)_,
with respect to population health and economy.
The raw data was loaded in R from the supplied file,
preproccessed,
the target data subset was extracted,
in-record validation was conducted,
the majority of missing values were imputed
(via a deterministic and conservative approach),
the observations were cross validated
and finally the table with the processed data was created,
which contained all information needed
to address the two questions of interest:
1. Across the United States, which types of events
(as indicated in the EVTYPE variable)
are most harmful with respect to population health?
2. Across the United States,
which types of events have the greatest economic consequences?
For the first question,
the harm on population health by each weather event type was
evaluated (separately) based on the average impact of the observations
that resulted in non-zero damage over each of the three perspectives
(fatalities, injuries and casualties) that were considered to be of importance.
Similarly for the second question,
the harm on economy by each weather event type was
evaluated (separately) based on the average impact from the observations
that resulted in non-zero damage over each of the three perspectives
(property damage, crop damage and economic damage)
that were considered to be of importance.
Although for both questions
the main criterion to rank the included weather event types
(from the most harmful to the least) for each perspective
was the overall average damage observed
(with respect to each perspective)
based on the observations that caused non-zero damage,
the average for the 90% of cases with lowest impact
versus the average for the 10% of cases with the highest impact
(for each of the included weather event types)
was reported to provide a more complete and insightful 'picture'
of the consequences observed by each weather event type,
due to the fact that for all perspectives,
the majority of weather event types were highly positively skewed.
The analysis was structured, performed and documented in such way
that fortifies the reproducibility of the report
and explains every required detail so that even the non-expert
can follow the procedure and understand the thought process
behind the decision making at each stage.
<br>
<font size="1">[back to start of this chapter](#ind-3--SYNOPSIS)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
<br>
<br>
<br>
***
# 4 STORM EVENTS DATASET {#ind-4--STORM-EVENTS-DATASET}
***
To conduct the analysis for this project,
the file with the raw data [repdata_data_StormData.csv.bz2
](https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)
was used,
which contains data from the *Storm Events Dataset*
gathered and made publicly available
by U.S. National Oceanic and Atmospheric Administration (NOAA).
Some [general information](#ind-4-1--General-Informations)
as well as two [points of interest](#ind-4-2--Points-Of-Interest)
about the dataset:
* [4.2.1 Changes in the composition of weather event types
](#ind-4-2-1--Changes-in-the-composition-of-weather-event-types)
* [4.2.2 Eligibility criteria for inclusion of weather events in the dataset
](#ind-4-2-2--Eligibility-criteria-for-inclusion-of-weather-events-in-the-dataset)
were discussed to provide the nessecary insights
in order to understand why the decisions which govern
the approach adopted in this analysis were made.
<br>
<font size="1">[back to start of this chapter](#ind-4--STORM-EVENTS-DATASET)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
## 4.1 General Informations {#ind-4-1--General-Informations}
The version of the dataset used in this analysis
contains observations for the severe weather events
that happened (or more accurately begun)
from January 1950 to November 2011 at United States.
Further details about the dataset (which was used in this analysis)
can be accessed by the supplemental material
provided at the instructions of the assignment:
* [NATIONAL WEATHER SERVICE INSTRUCTION 10-1605 (AUGUST 17, 2007)
](https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf)
(also available at the GitHub repository created to support this project through
[this link
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/supplemental_information/NATIONAL%20WEATHER%20SERVICE%20INSTRUCTION%2010-1605%20(AUGUST%2017%2C%202007).pdf))
* [Storm Data Faq Page](https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf)
(also available at the GitHub repository created to support this project through
[this link
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/supplemental_information/NCDC%20Storm%20Events-FAQ%20Page.pdf))
For additional information on the Storm Events Dataset,
as well as an updated and cleaner version of the data,
with observations from January 1950 up to January 2020
(at the time this report was produced,
but it is expected to continue updating),
it is recommended to visit and explore:
* [NOAA's Storm Events Dataset official wepbage](https://www.ncdc.noaa.gov/stormevents)
Finally, a document with detailed information for the history of the dataset,
was available at [NOAA's Storm Events Dataset wepbage for the version history](https://www.ncdc.noaa.gov/stormevents/versions.jsp):
* [The History of the Storm Events Database
](https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/The-History-of-the-Storm-Events-Database.docx)
(also available at the GitHub repository created to support this project through
[this link
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/supplemental_information/The-History-of-the-Storm-Events-Database.docx))
<br>
<font size="1">[back to start of this section](#ind-4-1--General-Informations)</font>
<font size="1">[back to start of this chapter](#ind-4--STORM-EVENTS-DATASET)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
## 4.2 Points Of Interest {#ind-4-2--Points-Of-Interest}
In order to understand why some of the decisions
which govern the approach adopted in this analysis were made,
it is essential to take into account
two crucial facts with respect to the observations recorded in the dataset:
* [4.2.1 Changes in the composition of weather event types
](#ind-4-2-1--Changes-in-the-composition-of-weather-event-types)
* Both the composition of the weather events types
that were recorded in the dataset
and the way the data was entered in the system
(the data entry procedure and the database software)
changed several times across the years.
* [4.2.2 Eligibility criteria for inclusion of weather events in the dataset
](#ind-4-2-2--Eligibility-criteria-for-inclusion-of-weather-events-in-the-dataset)
* Not every weather event that occurred
in the period that the dataset spans,
was automatically eligible to be recorded in the dataset.
Only those that have caused harm (either to population health or to economy)
or have gathered public interest were recorded.
<br>
<font size="1">[back to start of this section](#ind-4-2--Points-Of-Interest)</font>
<font size="1">[back to start of this chapter](#ind-4--STORM-EVENTS-DATASET)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
### 4.2.1 Changes in the composition of weather event types {#ind-4-2-1--Changes-in-the-composition-of-weather-event-types}
Through the years, as the publicity of the dataset soared,
several aspects governing the data collection procedure changed
in order to expand, enrich and fortify the quality of the data.
As a result the number of defined weather event types
that were collected increased several times starting from just one (*TORNADO*)
for the first few years and expanding into 48 defined weather event times
at the time the dataset used in this analysis was created.
Consequently there are inconsistencies in the the composition of weather event types
between different periods that could affect the integrity of the analysis.
Furthermore for the period 1996 up to 2000
while the weather event types that were being recorded
had already been significantly increased,
the values for the weather event type entries were entered
though a free text field
resulting in more than 950 different unique entries.
For this reason **it was decided to use for the analysis
only the part with observations since January 2001**,
for which as a result of the introduction of a drop down menu
and the removal of the free text field
for the entries of the weather event type values,
the majority of observations don't suffer from such problems
and the weather event types contained include the majority
of the latest defined weather event types.
<br>
<font size="1">[back to start of this subsection](#ind-4-2-1--Changes-in-the-composition-of-weather-event-types)</font>
<font size="1">[back to start of this section](#ind-4-2--Points-Of-Interest)</font>
<font size="1">[back to start of this chapter](#ind-4--STORM-EVENTS-DATASET)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
### 4.2.2 Eligibility criteria for inclusion of weather events in the dataset {#ind-4-2-2--Eligibility-criteria-for-inclusion-of-weather-events-in-the-dataset}
Out of all weather events that happened
in the period from January 2001 to November 2011 at United States
and were classified as one of the types that were recorded
(at the period they occurred),
only those in the subset that belonged
to at least one of the following three groups
were eligible to be included in the dataset:
1. The occurrence of storms and other significant weather phenomena
having sufficient intensity to cause loss of life, injuries,
significant property damage, and/or disruption to commerce.
2. Rare, unusual, weather phenomena that generate media attention.
3. Other significant meteorological events,
such as record maximum or minimum temperatures or precipitation
that occur in connection with another event.
An important implication of the above policy must be highlighted:
* From all the weather phenomena that happened
in the period from January 2001 to November 2011 at United States
and were of a type that was recorded at the time they occurred,
the dataset contains only the subset with those
that either resulted in harm (to population health or to economy)
or gathered high publicity.
* On the contrary all the weather phenomena that happened
in the period from January 2001 to November 2011 at United States
and neither caused any harm (to population health or to economy)
nor gathered high public interest,
were ignored, even if they were of a type
that was recorded at the time they occurred.
Consequently any conclusion made for a weather event type in *general*
will inevitably be biased, as it will overestimate the
consequences with respect to the harm they caused
(either to population health or to economy)
due to the fact that the available sample is not representative of the
the overall population of weather phenomena (of the types that were recorded)
by default.
For this reason **it was decided to use for the analysis:**
* __Only the subset of observations that resulted in non-zero harm
with respect to each of the perspectives of interest
(fatalities, injuries and casualties)
in order to determine the most harmful weather event types
for the population health.__
* __Only the subset of observations that resulted in non-zero harm
with respect to each of the perspectives of interest
(property damage, crop damage and economic damage)
in order to determine the most harmful weather event types
for the economy.__
<br>
<font size="1">[back to start of this subsection](#ind-4-2-2--Eligibility-criteria-for-inclusion-of-weather-events-in-the-dataset)</font>
<font size="1">[back to start of this section](#ind-4-2--Points-Of-Interest)</font>
<font size="1">[back to start of this chapter](#ind-4--STORM-EVENTS-DATASET)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
<br>
<br>
<br>
***
# 5 PRELIMINARY ACTIVITIES {#ind-5--PRELIMINARY-ACTIVITIES}
***
Executes four preliminary tasks
in order to ensure (and set when it is needed and possible)
that the working directory and the R session
are ready to proceed with the analysis:
* [5.1 Set The Random Seed](#ind-5-1--Set-The-Random-Seed)
* Sets a random seed to make the random events reproducible.
* [5.2 Load All Required Libraries](#ind-5-2--Load-All-Required-Libraries)
* Loads all libraries required to conduct the analysis and produce the report.
* [5.3 Create All Required Directories](#ind-5-3--Create-All-Required-Directories)
* Creates (if it doesn't exist) a directory tree (at the working directory)
in which the output files will be exported.
* [5.4 Access The File With The Raw Data](#ind-5-4--Access-The-File-With-The-Raw-Data)
* Downloads the file with the raw data, [repdata_data_StormData.csv.bz2
](https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)
in the working directory, if it doesn't already exist.
<br>
<font size="1">[back to start of this chapter](#ind-5--PRELIMINARY-ACTIVITIES)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
## 5.1 Set The Random Seed {#ind-5-1--Set-The-Random-Seed}
In an attempt to fortify the reproducibility of the random events,
the number *1234567890* was explicitly chosen and set as the random seed.
```{r set_a_random_seed_for_the_execution_of_the_script}
# Select a random seed.
selected_random_seed <- 1234567890
# Set the selected random seed.
set.seed(selected_random_seed)
```
*Note that the only random events that took place in this analysis
were the assignment of random positions for the labels at the plots: *
* _[Plot 1.1.4](#ind-8-1-4-1-4--Create-The-Plot-1-1-4)_
* _[Plot 1.2.4](#ind-8-2-4-1-4--Create-The-Plot-1-2-4)_
* _[Plot 1.3.4](#ind-8-3-4-1-4--Create-The-Plot-1-3-4)_
* _[Plot 2.1.4](#ind-9-1-4-1-4--Create-The-Plot-2-1-4)_
* _[Plot 2.2.4](#ind-9-2-4-1-4--Create-The-Plot-2-2-4)_
* _[Plot 2.3.4](#ind-9-3-4-1-4--Create-The-Plot-2-3-4)_
*by the function geom_repel_label() from the ggrepel library.*
<br>
<font size="1">[back to start of this section](#ind-5-1--Set-The-Random-Seed)</font>
<font size="1">[back to start of this chapter](#ind-5--PRELIMINARY-ACTIVITIES)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
## 5.2 Load All Required Libraries {#ind-5-2--Load-All-Required-Libraries}
Loads all libraries required to conduct the analysis and produce the report.
```{r load_all_required_libraries}
# Load all required libraries.
library(tools)
library(rmarkdown)
library(knitr)
library(kableExtra)
library(magrittr)
library(DT)
library(rmdformats)
library(data.table)
library(validate)
library(stringr)
library(moments)
library(ggplot2)
library(ggrepel)
library(grid)
library(gridExtra)
```
*Note that the library:*
- *rmdformats*
- *which was only used to produce the [Report
](https://jzstats.github.io/Reproducible-Research--2nd-Assignment/Report.html)*
*is not essential to conduct the analysis.*
<br>
<font size="1">[back to start of this section](#ind-5-2--Load-All-Required-Libraries)</font>
<font size="1">[back to start of this chapter](#ind-5--PRELIMINARY-ACTIVITIES)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
## 5.3 Create All Required Directories {#ind-5-3--Create-All-Required-Directories}
During the execution of the main script, [RepRes_analysis.Rmd
](https://github.com/jzstats/Reproducible-Research--2nd-Assignment/blob/master/RepRes_analysis.Rmd)
several outputs are produced,
(that are also included in the report),
mostly to enhance further the reproducibility of the analysis.
All those files are exported in appropriate sub-directories
inside the directory with name *outputs*
which is created at the working directory.
```{r create_the_directory_tree_for_the_outputs_of_the_script}
# Create a list with the paths to all sub-directories
# of the directory tree for the outputs of this analysis
directory_tree_____outputs <- list(
"filepath_____outputs_____processed_data" =
file.path("outputs", "processed_data"),
"filepath_____outputs_____harm_on_population_health_____figures" =
file.path("outputs", "harm_on_population_health", "figures"),
"filepath_____outputs_____harm_on_population_health_____results" =
file.path("outputs", "harm_on_population_health", "results"),
"filepath_____outputs_____harm_on_economy_____figures" =
file.path("outputs", "harm_on_economy", "figures"),
"filepath_____outputs_____harm_on_economy_____results" =
file.path("outputs", "harm_on_economy", "results"),
"filepath_____outputs_____reproducibility_support_____r_session" =
file.path("outputs", "reproducibility_support", "r_session"),
"filepath_____outputs_____reproducibility_support_____MD5_checksums" =
file.path("outputs", "reproducibility_support", "MD5_checksums")
)
# Create the directory tree for the outputs of the analysis.
invisible(lapply(
X = directory_tree_____outputs,
FUN = function(filepath_of_subdirectory) {
if ( ! dir.exists(filepath_of_subdirectory) ) {
dir.create(filepath_of_subdirectory, recursive = TRUE)
}
}
))
# Check if all subdirectories of the directory for the outputs of the analysis
# were successfully created.
do_the_directories_exists <- vapply(
X = directory_tree_____outputs,
FUN = dir.exists,
FUN.VALUE = logical(1)
)
# If failed to created any of the sub-directories
# required for the outputs of the analysis
# the process terminates
if (any(!do_the_directories_exists)) {
stop(
"\n",
"Failed to create the directories: ", "\n",
paste0("\t", directory_tree_____outputs[!do_the_directories_exists], "\n"),
"The process is aborted for now.", "\n",
"Please rerun the script or create the required sub-directories manually.",
"\n"
)
}
```
*If failed to created any of the sub-directories in the directory tree
for the outputs of the analysis, the process terminates.*
<br>
<font size="1">[back to start of this section](#ind-5-3--Create-All-Required-Directories)</font>
<font size="1">[back to start of this chapter](#ind-5--PRELIMINARY-ACTIVITIES)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
## 5.4 Access The File With The Raw Data {#ind-5-4--Access-The-File-With-The-Raw-Data}
The file with name [repdata_data_StormData.csv.bz2
](https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2),
which contains data from the [Storm Events Dataset](#ind-4--STORM-EVENTS-DATASET)
was supplied for this assignment and used to conduct the analysis.
If the file doesn't already exists at the working directory,
an attempt will be made to download it automatically.
```{r access_the_file_with_the_compressed_raw_data}
# Path to the file with the compressed raw data.
filepath_____unprocessed_data <- "repdata_data_StormData.csv.bz2"
# The link supplied by the instuctions of the assignment
# to download the file with the compressed raw data.
url_to_download_the_data_file <-
"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
# Check if the file 'repdata_data_StormData.csv.bz2',
# with the compressed raw data is available at the working directory.
## if it doesn't exist...
if ( !file.exists(filepath_____unprocessed_data) ) {
message(
"\n",
"The file, '", filepath_____unprocessed_data, "'", "\n",
"doesn't exists at the working directory.",
"\n"
)
message(
"\n", "Trying to download the file, ", "\n",
"'", filepath_____unprocessed_data, "' ", "\n",
"with the raw data from the url: ", "\n",
"\t", "'", url_to_download_the_data_file, "'"
)
### ...an attempt is made to download it from the link supplied by assignment
try(
download.file(
url = url_to_download_the_data_file,
destfile = filepath_____unprocessed_data)
)
## Checks if the file 'repdata_data_StormData.csv.bz2'
## was successfully downloaded.
### in case the file is not found at the working directory
### after the attempt to download
### the process terminates with an informative message
### that explains the situation to the user
if ( !file.exists(filepath_____unprocessed_data) ) {
stop(
"\n",
"Failed to download the required file,", "\n",
"'", filepath_____unprocessed_data, "'", "\n",
"with the raw data.", "\n",
"The process is aborted for now."
)
}
}
```
*If the download fails, the process terminates.*
<br>
<font size="1">[back to start of this section](#ind-5-4--Access-The-File-With-The-Raw-Data)</font>
<font size="1">[back to start of this chapter](#ind-5--PRELIMINARY-ACTIVITIES)</font>
<font size="1">[back to *TABLE OF CONTENTS*](#ind-1--TABLE-OF-CONTENTS)</font>
<br>
<br>
<br>
<br>
<br>
<br>
***
# 6 DATA PROCESSING {#ind-6--DATA-PROCESSING}
***
The data processing pipeline, started with a supplied file,
[*repdata_data_StormData.csv.bz2*
](https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)
that contained raw data from the
[*Storm Events Dataset*](#ind-4--STORM-EVENTS-DATASET)
and produced the [table with the processed data](#ind-7--PROCESSED-DATA).
The pipeline consists of seven distinct stages:
1. [**Load The Raw Data In R**](#ind-6-1--Load-The-Raw-Data-In_R)
* The table with the raw data was created
by loading in R the raw data from the supplied file
with the compressed raw data with all variables coerced to character type.
Post validation was conducted and
an overview of the table with the raw data was presented.
2. [**Preprocess The Raw Data**](#ind-6-2--Preprocess-The-Raw-Data)
* From the data at the table with the raw data,
in order to create the table with the preprocessed data
prerequisites were verified about the variables required for the analysis
before they were selected, coerced to their appropriate types
and a key was set for the table. Post validation was conducted and
an overview of the table with the preprocessed data was presented.
3. [**6.3 Extract The Target Data Subset**](#ind-6-3--Extract-The-Target-Data-Subset)