To produce the figures and charts in this story, The Dallas Morning News obtained data from four sources: the Census Bureau, the Centers for Disease Control and Prevention, Everytown for Gun Safety and the Bureau of Alcohol, Tobacco, Firearms and Explosives. All of the data analysis and cleaning was done using the Python programming language. Visualization was done with a combination of DataWrapper, QGIS, Plotly and Mapbox GL JS. To produce the map and figures for age-adjusted firearm deaths, the News downloaded 2020 firearm mortality data from the CDC website. This data was read into Python using Pandas and then sorted by the mortality rate from highest to lowest, in order to determine Texas’ status on this metric. The data was also uploaded to DataWrapper, where it was styled as a choropleth map.
Mass shootings data was obtained from the Everytown for Gun Safety website. This data goes back to 2009 and uses the FBI definition of a mass shooting, which is an incident where four or more people were killed. At the time the data was downloaded it did not contain information on the Uvalde shooting, so this data was inputted into the spreadsheet manually. This data was read into Python and grouped by state to determine the figures the number of people killed, the number of people wounded and the number of people shot in mass shootings, as well as the number of these shootings that had occurred. This aggregated data was exported as a separate spreadsheet and joined to a Census shapefile of states in QGIS. After completing the join, the data layer was exported as a geojson file and uploaded to Mapbox Studio, along with the original Everytown data. The map was styled so that the darker shaded states indicated more mass shootings and larger circles indicated more people killed in a mass shooting.
A number of different ATF data files were used to produce the figures in this story. In order to determine which state had the most gun dealers (defined as either a firearm dealer or a pawnbroker that sells firearms), The News downloaded May 2022 licensing data from the ATF website and uploaded it to Geocodio to obtain the coordinate points for each record. This data was then read into a Python notebook for cleaning and analysis. To clean the data, The News converted all columns to lowercase and renamed some columns to be shorter. There were 33 duplicate records (out of an initial 81,933 rows), where the license name, type and address matched; these were dropped from the data. The data was then filtered down to only include records of type 1 or 2 (gun dealers and gun-selling pawnbrokers) and then grouped by state to obtain the number of dealers. A median and mean feature was then engineered by taking these respective values from the count column. Then the count for each state was divided by these median and mean values to create two new features, showing how many times larger the number of stores was greater than the measures of average. Next the data was joined to a csv file of states and their abbreviations, which was copy-pasted from the USPS website. These state names were used to perform a second join to Census data estimating each state’s population in July 2021, the latest date available. These population estimates were used to create new gun dealers per capita and gun dealers per 100,000 residents figures.
In order to determine which state had the most gun manufacturers, The News wrote a scraper using the Selenium library in Python to grab every Excel ATF licensing data file, for every state and territory in the country, every month where it was available. In total this scraping yielded 2121 component spreadsheets which were combined into a single dataframe with Pandas. A month, year and date feature was engineered for each component spreadsheet before being added to the larger database. A very small number of records which contained errors were dropped and then this larger database was exported to a CSV file. The CSV was then read into another Python notebook file for cleaning, analysis and visualization.
A large number of errors were noticed in this file. For example, there appeared to be more than 1.6 million duplicate records, where the license name, type, date, street address, city, state and zip code matched another record. These records were dropped. However even after removing these records, in 36/42 total instances where data were filtered to show the number of gun manufacturers (type 7) in Texas for a given month, every license name in the file was present at least twice. When visualized, the News also noticed strange dips in the data, where thousands of records of gun manufacturers would disappear for one month and return in the next. A spokeswoman from the Dallas ATF field office was contacted regarding these apparent errors, but she has not yet responded to request for comment. A chart was generated from this data, but it was not included in the final story because of potential problems with the data.