A Python notebook for analyzing pollution CSV data.
- Aims to analyze the change in air quality index (AQI) for 4 major pollutants (Nitrogen Dioxide, Sulphur Dioxide, Carbon Monoxide, Ozone) that cover all 50 states of the United States from 2000 to 2016
- Procured from the Environmental Protection Agency (EPA)
- Consists of daily pollution data for the 4 major pollutants over 16 years for all 50 states (382 MB, 1746661 lines)
- Key points:
- Organized by Date, dates repeat per county where the source data was acquired
- NO2 and SO2 are in parts per billion while O3 and CO are in parts per million
- The max hour describes what hour of the day the AQI was highest
- Reading Data:
- Pandas to read the CSV
- Data relatively unorganized1 but contained diverse information
- Grouped data by date
- Calculated the means for each state
- Visualization:
- Matplotlib - Graphs
- Cartopy - Maps
- Types of Visualizations:
- Multi-line charts
- Bubble maps
- Multivariate linear regressions
- Heat maps
Line Charts | Map Visualizations |
---|---|
- CO and NO2 have decreased while O3 and SO2 have largely remained stagnant
- O3 and NO2 are recently the more prevalent pollutants compared to CO and SO2
- When looking at results, be aware of:
- Results may be affected by holes in the data as shown in the multi line charts and the heat maps
- Differentiating between interpreting results on actual AQI index versus the change on various factors
- Analyzing on case-by-case basis per pollutant is necessary to understand trends and how factors such as laws may affect pollution
- Awareness of air pollution is one big factor in the slow decline in some pollutants