Skip to content

The project I produced for the 2nd Assignment of the Course 'Rerpoducible Research' from 'Data Science Specialization' by 'Johns Hopkins University' at Coursera.

License

Notifications You must be signed in to change notification settings

jzstats/Reproducible-Research--2nd-Assignment

Repository files navigation

About This Repository

This repository was created to support the project:

It contains all the material, relevant to the project:

  1. RepRes_analysis.Rmd
    • The main script of the project, that contains all the code and verbose required to produce the:
      1. RepRes_analysis.md
        • The Markdown version of the report. It is not advised to examine the analysis from this file, as it couldn’t present all the included code effectively (extra chunks of html were used for various proposes).
          In addition, due to the extensive length of the Report it isn’t very practical either.
      2. docs/Report.html
        • The version of the Report that was obtained by rendering the RepRes_analysis.Rmd with the script render_____RepRes_analysis.R to obtain a bookdown version of the report, powered by the rmdformats library. A visually appealing, fully functional and practical version of the Report that includes all the code used for the analysis (folded by default).
  2. repdata_data_StormData.csv.bz2
    • The file with the compressed unprocessed data, on which this analysis is based on, which contains data from the Storm Events Dataset gathered and made publicly available by U.S. National Oceanic and Atmospheric Administration (NOAA).
  3. supplemental_information
    • Contains files with information on the Storm Events Dataset that had major impact on the analysis:
      • supplemental_information/NATIONAL WEATHER SERVICE INSTRUCTION 10-1605 (AUGUST 17, 2007)
        • General information and directions for the Storm Events Dataset.
      • supplemental_information/NCDC Storm Events-FAQ Page.pdf
        • Frequently asked questions about the Storm Events Dataset.
      • supplemental_information/The-History-of-the-Storm-Events-Database.docx
        • Detailed information for the history of the Storm Events Dataset.
  4. outputs
    • contains the outputs that were produced (as a side effect) during the execution of the script:
      • outputs/processed data
        • Contains the resulting table with the processed data, that was obtained after the unprocessed data was loaded in R and processed through the data processing pipeline:
          • outputs/processed data/table_with_the_processed_data.R
      • outputs/harm_on_population_health
        • Contains the outputs with results for the harm on population health:
          • outputs/harm_on_population_health/figures
            • Contains the Figure 1 with the overview of harm on population health by each weather event type:
              • outputs/harm_on_population_health/figures/figure_1.png
          • outputs/harm_on_population_health/results
            • Contains the files with the tables with the results for the harm on population health by each weather event type for each of the perspectives examined in the analysis:
              • outputs/harm_on_population_health/results/summary_____harm_on_population_health______fatalities.R
              • outputs/harm_on_population_health/results/summary_____harm_on_population_health______injuries.R
              • outputs/harm_on_population_health/results/summary_____harm_on_population_health______casualties.R
      • outputs/harm_on_economy
        • Contains the outputs with results for the harm on economy:
        • outputs/harm_on_economy/figures
          • Contains the Figure 2 with the overview of harm on economy by each weather event type:
            • outputs/harm_on_economy/figures/figure_2.png
        • outputs/harm_on_economy/results
          • Contains the files with the tables with the results for the harm on economy by each weather event type for each of the perspectives examined in the analysis:
            • outputs/harm_on_economy/results/summary_____harm_on_population_health______property_damage.R
            • outputs/harm_on_economy/results/summary_____harm_on_population_health______crop_damage.R
            • outputs/harm_on_economy/results/summary_____harm_on_population_health______economic_damage.R
      • outputs/reproducibility_support
        • Contains information that can assist any attempt to reproduce the analysis.
          • outputs/reproducibility_support/r_session
            • Information on the R session and the options that were active when the original analysis took place:
              • outputs/reproducibility_support/r_session/session_info.R
              • outputs/reproducibility_support/r_session/r_options.R
          • outputs/reproducibility_support/MD5_checksum
            • Contains files with the MD5 checksums for the unprocessed data, processed data, and the tables with the results that were produced during the original execution of the script:
              • outputs/reproducibility_support/MD5_checksum/unprocessed_data_____MD5_checksum.txt
              • outputs/reproducibility_support/MD5_checksum/processed_data_____MD5_checksum.txt
              • outputs/reproducibility_support/MD5_checksum/results_____MD5_checksum.txt
  5. docs

About The Assignment

This project was created for the 2nd peer-graded assignment of:

Course 5: Reproducible Research,
from Data Science Specialization,
by Johns Hopkins University,
at Coursera

The course is taught by:

  • Jeff Leek, PhD
  • Roger D. Peng, PhD
  • Brian Caffo, PhD

As putted by the teachers of the course:

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

The assignment requests to address 2 questions:

Your data analysis must address the following questions:

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Question 2: Across the United States, which types of events have the greatest economic consequences?

based on the observation from the supplied dataset:

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

Some quite general guidelines and a tip were provided:

Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

It was deliberately decided to adopt a more educational approach aiming to produce a well-justified and self-explained product that can serve as guide to a beginner on how a basic pipeline can be constructed in order to obtain a report with an analysis from scratch.

About

The project I produced for the 2nd Assignment of the Course 'Rerpoducible Research' from 'Data Science Specialization' by 'Johns Hopkins University' at Coursera.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages