I investigated vehicular accidents in the United States over a 21-year period (1996 to 2016). There are many potential ways to explore this dataset, so I tried to understand it broadly while also focusing on some more specific aspects and potential relationships. I first explored the data broadly by visualizing the number of people involved in vehicular accidents over varying timescales. I then looked at the breakdown of the number of people involved in accidents by many of the variables included to get a sense of the dataset. After this more general visualization, I attempted to uncover some of the relationships between specific variables and the number of people involved in accidents. In particular, I focused on injury severity, age, and alcohol involvement. I also focused on local states in portions of my analysis (Maine, New Hampshire, Vermont, and Massachusetts).
This dataset was collected by the National Highway Traffic Safety Administration (NHTSA) for the years 1996 through 2016. The data includes the following information for each person involved in a vehicular accident: state, county, month, day, year, hour, minute, manner of collision, number of vehicles involved, type of vehicle involved, number of people involved, age of driver, sex of driver, involvement of alcohol, and severity of injury.
NHTSA Data: https://www.nhtsa.gov
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Gimond, M. (2020). Exploratory Data Analysis in R. https://mgimond.github.io/ES218/index.html