A comparison of the last two decades (2001 - 2021) of weather forecasting data from Porto Alegre, Rio Grande do Sul, Brazil.
The code requires Python versions of 3.* and general libraries available through the Anaconda package.
As a citizen of Porto Alegre for more than 25 years I started to have the feeling that our weather is changing a little on these last years. Out of curiosity I wanted to visualize what differences are perceptible.
This is a simple project intended only to consolidate my knowledge of pandas, matplotlib and seaborn as I'm currently deepening my comprehension of them. I'm not offering a deep and solid analysis of Porto Alegre's weather changes along the years. My only plan is to learn and to see if my sensory perception of the city's temperatures seems to be in line with the data or not.
I got this .csv file with historical data from Porto Alegre from the INMET (Instituto Nacional de Meteorologia) website. It's very simple to get the data in the way want (specific variables, date range, etc). You can go to https://bdmep.inmet.gov.br/ (the INMET database website) and ask for it, it gives you many options to customise your data. The data (.csv) I've used in this project you can find on this repo as "DATA_POA_2001-01-01_2022-03-11.csv".
This project have the main goal of improving my EDA (Exploratory Data Analysis) skills so it was divided in the following parts: data extraction, data preparation (cleaning), creation of a summarized table to facilitate the analysis and EDA (that have multiple specific parts like summer, winter, precipitations)
I read the data received from INMET and generated this dataframe with 7740 observations and 8 cols.
Data Medicao | PRECIPITACAO TOTAL, DIARIO (mm) | TEMPERATURA MAXIMA, DIARIA (C) | TEMPERATURA MEDIA, DIARIA (C) | TEMPERATURA MINIMA, DIARIA (C) | UMIDADE RELATIVA DO AR, MEDIA DIARIA (%) | UMIDADE RELATIVA DO AR, MINIMA DIARIA (%) | Unnamed: 7 | |
---|---|---|---|---|---|---|---|---|
0 | 2001-01-01 | 0 | 30,1 | 23,616667 | 18,4 | 68,458333 | 48.0 | NaN |
1 | 2001-01-02 | 0 | 32,1 | 25,475 | 20 | 69,958333 | 45.0 | NaN |
2 | 2001-01-03 | 0 | 33,4 | 26,345833 | 21 | 69,083333 | 43.0 | NaN |
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7740 entries, 0 to 7739 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Data Medicao 7740 non-null object 1 PRECIPITACAO TOTAL, DIARIO (mm) 7210 non-null object 2 TEMPERATURA MAXIMA, DIARIA (C) 7448 non-null object 3 TEMPERATURA MEDIA, DIARIA (C) 7232 non-null object 4 TEMPERATURA MINIMA, DIARIA (C) 7453 non-null object 5 UMIDADE RELATIVA DO AR, MEDIA DIARIA (%) 7529 non-null object 6 UMIDADE RELATIVA DO AR, MINIMA DIARIA (%) 7628 non-null float64 7 Unnamed: 7 0 non-null float64 dtypes: float64(2), object(6) memory usage: 483.9+ KB
In this phase I've made many steps to improve the dataframe's readability and functionality, these steps were:
- Drop the useless cols
- Simplify and translate cols names
- Convert str data types to datetime (date) and to float (others)
- Decrease the amount of NaN
- Dropping non-useful rows
- Creating 'year', 'month' and 'day' cols
- Set date as index
In an attempt to decrease the amount of NaN and trying to lose the least amount of rows, I try to make the average from min and max temperature and replace the NaN of the avg_temp with it, in the rows where I have min & max temp but not avg.
As I don't plan right now to apply ML models on this dataset the best choice for the rows that contain too many NaNs is to drop it commpletely
total_precip | max_temp | avg_temp | min_temp | avg_humidity | min_humidity | year | month | day | |
---|---|---|---|---|---|---|---|---|---|
date | |||||||||
2001-01-01 | 0.0 | 30.1 | 23.616667 | 18.4 | 68.458333 | 48.0 | 2001 | 1 | 1 |
2001-01-02 | 0.0 | 32.1 | 25.475000 | 20.0 | 69.958333 | 45.0 | 2001 | 1 | 2 |
2001-01-03 | 0.0 | 33.4 | 26.345833 | 21.0 | 69.083333 | 43.0 | 2001 | 1 | 3 |
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 7483 entries, 2001-01-01 to 2022-03-10 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 total_precip 7209 non-null float64 1 max_temp 7448 non-null float64 2 avg_temp 7425 non-null float64 3 min_temp 7432 non-null float64 4 avg_humidity 7481 non-null float64 5 min_humidity 7455 non-null float64 6 year 7483 non-null int64 7 month 7483 non-null int64 8 day 7483 non-null int64 dtypes: float64(6), int64(3) memory usage: 584.6 KB
After cleaning & preparation: 7483 observations and 9 cols.
In this second part of data cleaning/preparation we will summarize all the data from our dataframe and create another dataframe with all these summarized info. We will firstly divide summer, seasons and years in different tables. After that we will put it all together in a summarized table called 'seasons'.
SUM_temp | SUM_max | SUM_max_avg | SUM_min | SUM_min_avg | SUM_hum | SUM_hum_min | WIN_temp | WIN_max | WIN_max_avg | WIN_min | WIN_min_avg | win_hum | WIN_hum_min | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 | 24.68 | 37.9 | 30.56 | 14.9 | 20.58 | 69.79 | 49.92 | 16.66 | 30.9 | 21.85 | 2.6 | 12.77 | 77.11 | 58.29 |
2002 | 25.47 | 38.0 | 31.20 | 15.7 | 21.11 | 71.35 | 50.67 | 15.71 | 33.7 | 20.57 | 3.4 | 12.05 | 77.31 | 55.75 |
2003 | 23.93 | 35.8 | 29.78 | 15.1 | 19.74 | 70.27 | 55.64 | 15.26 | 34.5 | 20.58 | 3.0 | 11.26 | 76.22 | 53.50 |
2004 | 24.57 | 39.4 | 30.89 | 15.0 | 20.19 | 67.79 | 45.67 | 15.60 | 36.7 | 21.00 | 2.7 | 11.54 | 76.94 | 45.25 |
2005 | 24.63 | 38.7 | 30.47 | 13.5 | 20.60 | 70.91 | 53.08 | 16.34 | 31.9 | 21.53 | 1.9 | 12.50 | 77.30 | 49.54 |
2006 | 24.87 | 35.8 | 30.53 | 15.1 | 20.70 | 71.96 | 52.46 | 15.94 | 32.3 | 21.17 | 2.7 | 12.12 | 77.37 | 55.08 |
2007 | 24.21 | 37.3 | 30.05 | 14.4 | 20.06 | 71.43 | 56.46 | 14.62 | 33.2 | 19.73 | 2.2 | 10.78 | 78.32 | 48.29 |
2008 | 23.66 | 35.3 | 29.16 | 13.9 | 19.82 | 73.34 | 55.25 | 14.92 | 31.8 | 19.80 | 2.3 | 11.40 | 78.33 | 57.12 |
2009 | 25.17 | 38.5 | 30.71 | 13.7 | 21.47 | 73.96 | 52.38 | 14.76 | 33.4 | 19.87 | 0.3 | 10.92 | 77.64 | 42.42 |
2010 | 24.55 | 36.2 | 30.12 | 14.1 | 20.99 | 75.54 | 63.04 | 15.34 | 32.9 | 20.23 | 2.8 | 11.79 | 78.64 | 58.00 |
2011 | 24.85 | 38.4 | 31.08 | 14.7 | 20.55 | 71.39 | 54.67 | 14.29 | 32.4 | 19.36 | 1.9 | 10.71 | 79.40 | 60.75 |
2012 | 23.61 | 39.0 | 29.35 | 14.4 | 19.69 | 72.25 | 52.29 | 16.46 | 33.0 | 22.20 | 1.1 | 12.34 | 76.24 | 49.17 |
2013 | 25.76 | 40.6 | 31.93 | 17.4 | 21.60 | 72.23 | 41.71 | 14.89 | 35.1 | 20.31 | 1.4 | 11.08 | 80.03 | 55.96 |
2014 | 24.86 | 36.5 | 30.35 | 14.4 | 21.13 | 76.41 | 60.62 | 15.88 | 34.3 | 21.16 | 3.6 | 12.16 | 81.22 | 56.17 |
2015 | 25.07 | 38.9 | 30.73 | 16.9 | 21.17 | 76.59 | 52.71 | 17.21 | 34.8 | 22.16 | 5.7 | 13.74 | 81.11 | 47.62 |
2016 | 25.32 | 38.3 | 31.09 | 13.6 | 21.47 | 78.17 | 65.25 | 14.72 | 32.9 | 19.84 | 4.1 | 11.14 | 81.85 | 57.88 |
2017 | 24.10 | 36.8 | 30.10 | 13.9 | 20.06 | 77.08 | 53.08 | 17.66 | 34.8 | 23.49 | 5.5 | 13.89 | 81.34 | 60.83 |
2018 | 25.28 | 38.5 | 31.03 | 16.9 | 21.42 | 77.38 | 61.17 | 14.92 | 32.9 | 20.27 | 2.6 | 11.19 | 84.34 | 64.25 |
2019 | 25.20 | 40.3 | 31.94 | 14.6 | 20.58 | 68.51 | 51.62 | 16.11 | 36.1 | 21.81 | 2.2 | 12.24 | 79.47 | 60.04 |
2020 | 24.67 | 38.3 | 30.71 | 15.7 | 20.58 | 72.95 | 48.71 | 15.56 | 31.3 | 21.19 | 2.7 | 11.52 | 78.93 | 57.33 |
2021 | 25.52 | 40.3 | 32.17 | 17.4 | 21.01 | 71.67 | 58.08 | 15.35 | 34.5 | 20.68 | 2.3 | 11.66 | 79.13 | 55.38 |
- SUM : Summer
- WIN : Winter
- SUM/WIN_temp --> The average for that summer's/winter's temperatures
- SUM/WIN_max --> The highest temperature for that summer/winter
- SUM/WIN_min --> The lowest temperature for that summer/winter
- SUM/WIN_max_avg --> The average for that summer's/winter's daily maximum temperatures
- SUM/WIN_min_avg --> The average for that summer's/winter's daily minimum temperatures
- SUM/WIN_hum --> The average of the daily humidities of that summer/winter
- SUM/WIN_hum_min --> The lowest humidity of that summer/winter
In this phase I try to answer a few questions about many topics related to this sensorial feeling that Porto Alegre is getting hotter.
Questions to answer:
- Did the avg temperature rose up in these last 7 years?
- Did the max temperature rose up?
To answer these questions my strategy was to plot the data from the summer's data in the summarised dataframe.
From the graph above it really seems like the average temperatures and max temperatures really got a little higher. There are many interesting insights that can be made from this plot, but the general idea it gives to me is that all the variables are getting higher on average.
As an attempt to visualize more this change of the averages I've tried to plot a new graph with a line in the center representing the mean of all the year averages from these two decades (which is 24.76C°). After this line, I plotted on this same graph bars that represents how far each year's average is from that mean, they represent the diversion of the the values in relation to that mean/line.
As we can see in the plot above, the year's summer temperature average is increasing. The first decade of the century had on average lower temperatures during the summer and the second decade the opposite.
This analysis reinforce the idea that Porto Alegre's summers are getting hotter. My perception that our summers are being hotter seems to be in line with data.
Summer's avg temperatures | Avg of highest summer temp | Avg of daily summer max temps | Avg of lowest summer temps | Average of daily summer min temps | |
---|---|---|---|---|---|
2001 - 2007 | 24.62 | 37.56 | 30.50 | 14.81 | 20.43 |
2008 - 2014 | 24.64 | 37.79 | 30.39 | 14.66 | 20.75 |
2015 - 2021 | 25.02 | 38.77 | 31.11 | 15.57 | 20.90 |
With the plots that I showed before and with this table that shows the mean of many variables of the city summers I feel confident to say that my perceptions are at least coherent with the data: Porto Alegre is having higher temperatures and higher averages during these last 7 years, but it's not a huge difference.
We can't confirm with certainty that my perception is related to any real changes in the city's climate, but it certainly goes in the same direction that some studies already found (I will link an interesting one below), the climate change is starting to be noticeable by people who live in the city.
Climate change in Rio Grande do Sul, by Bibiana Dávila at UFRGS
Questions to answer:
- Did the mean temperature increased?
- Did the minimum temperature average increased?
To answer these questions my strategy was to plot the data from the winter's data in the summarised dataframe.
Winter's avg temperatures | Avg of higuest winter temp | Avg of daily winter max temps | Avg of lowest winter temps | Average of daily winter min temps | |
---|---|---|---|---|---|
2001 - 2007 | 15.73 | 33.31 | 20.92 | 2.64 | 11.86 |
2008 - 2014 | 15.22 | 33.27 | 20.42 | 1.91 | 11.49 |
2015 - 2021 | 15.93 | 33.90 | 21.35 | 3.59 | 12.20 |
With the winter we can see a similar pattern to the one that our summer is following: average temperatues, maximum and minimum variables are all (somewhat) increasing. All the variables had some increase, but the maximum averages and the two minimum variables (average and lowest temperature) had highest increase.
The average temperatures are getting higher and all the other metrics too. Answering our questions: Yes, the mean temperature average suffered an increase and the miniminum temperatures (and averages) suffered too.
Again, the residents of POA can feel a difference and this difference can be felt by our sensations.
Questions to answer:
- Did the humidity averages suffered any changes along these years?
- Does it follows the changes of other variables?
Avg humidity along years | Avg min humidity along years | |
---|---|---|
2001 - 2007 | 74.09 | 51.16 |
2008 - 2014 | 75.47 | 52.34 |
2015 - 2021 | 77.85 | 54.83 |
The average humidity and minimum humidity mantained themselves in a similar averages along these 20 years although they seem to be increasing too. I'm no specialist, so I can't affirm anything, even if there's a small increase on it. It can be only a common difference or something else, only a little bit of more research from my part can understand it further.
Answering the questions: they seems to be increasing although it's a subtle difference. As all the other variables seems to be increasing too I can say that this change is following the changes of other metrics
Questions to answer:
- Did the precipitations sums had been trough some change along the years?
Sum of precipitation (mm) | |
---|---|
2001 - 2007 | 7532.4 |
2008 - 2014 | 10567.8 |
2015 - 2021 | 10809.8 |
Sum of precipitation (mm) | |
---|---|
2001 - 2010 | 12258.6 |
2011 - 2020 | 15431.4 |
2021 - 2021 | 1220.0 |
In this last table we are able to see that we having considerable more rain in this last decade of 2011 to 2020. Being specific: the difference between 2001-2010 sum and the 2011-2020 sum is 3172.8mm more precipitation for the latter.
This tendency towards a higher amount of precipitation is something expected in the climate change context of the state we are in. Rio Grande do Sul suffers from the climate changes that are being made trough the amazon forest deforestation, in more than one way it intensifies the precipitations on our region.
The medium's article for this project can be found in this repository. If you want to contact me or there's any question about the analysis, feel free to reach me on https://www.linkedin.com/in/attrindade/.
After this EDA I'm planning to explore other parts of this data (like thermal amplitude) and if I can get more data (other years) I will think about doing some forecasting.
This will only happen in the future, so see you next time!