This is a small package that I have developed to look at some of the alberta specific covid data.
This file will become your README and also the index of your documentation.
pip install covid_alberta
The albertaC19
is a class that scrapes the updated stats off of the alberta Covid-19 website.
example of using the webscraper
abC19scaper = covid_alberta.albertaC19(outputfolder="")
# I don't plan on writing out the data in this example thus the keywords
ab_totals, ab_regions, ab_testing = abC19scaper.scrape_all(fltypes=None, return_dataframes=True)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-cea1377c44fe> in <module>
1 abC19scaper = covid_alberta.albertaC19(outputfolder="")
2 # I don't plan on writing out the data in this example thus the keywords
----> 3 ab_totals, ab_regions, ab_testing = abC19scaper.scrape_all(fltypes=None, return_dataframes=True)
c:\Repositories_C\covid_alberta\covid_alberta\webscraper.py in scrape_all(self, totalfl, regionsfl, testfl, fltypes, combine_dataframes, return_dataframes)
335
336 '''
--> 337 totals = self.scrape_albertaTotals(output_filename=totalfl, fltypes=fltypes, return_dataframe=return_dataframes)
338 regions = self.scrape_albertaRegions(output_filename=regionsfl, fltypes=fltypes, return_dataframe=return_dataframes)
339 testing = self.scrape_albertaTesting(output_filename=testfl, fltypes=fltypes, return_dataframe=return_dataframes)
c:\Repositories_C\covid_alberta\covid_alberta\webscraper.py in scrape_albertaTotals(self, output_filename, fltypes, update_figure_order, return_dataframe)
177 # Scrape the data
178 ab_cumulative = json.loads(totals_results[fig_order['cum_cases']].string)
--> 179 ab_daily_cases = json.loads(totals_results[fig_order['daily_cases']].string)
180 ab_case_status = json.loads(totals_results[fig_order['case_status']].string)
181
IndexError: list index out of range
Now we can show the dataframes
ab_totals.tail()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
cum_cases | Confirmed_count | Probable_count | Daily_count | Active_cum | Died_cum | Recovered_cum | |
---|---|---|---|---|---|---|---|
2020-04-04 | 1250 | 38 | 19 | 57 | 618 | 23 | 322 |
2020-04-05 | 1308 | 35 | 23 | 58 | 676 | 24 | 382 |
2020-04-06 | 1344 | 20 | 16 | 36 | 712 | 27 | 449 |
2020-04-07 | 1409 | 39 | 26 | 65 | 776 | 27 | 518 |
2020-04-08 | 1423 | 9 | 5 | 14 | 876 | 29 | 518 |
ab_regions.tail()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Calgary_cumulative | Central_cumulative | Edmont_cumulative | North_cumulative | South_cumulative | Unknown_cumulative | |
---|---|---|---|---|---|---|
2020-04-04 | 778 | 61 | 315 | 75 | 19 | 2 |
2020-04-05 | 801 | 65 | 340 | 79 | 21 | 2 |
2020-04-06 | 821 | 65 | 348 | 86 | 22 | 2 |
2020-04-07 | 854 | 72 | 364 | 94 | 23 | 2 |
2020-04-08 | 860 | 72 | 368 | 95 | 26 | 2 |
ab_testing.tail()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
test_count | |
---|---|
2020-04-04 | 1737 |
2020-04-05 | 1112 |
2020-04-06 | 1129 |
2020-04-07 | 1319 |
2020-04-08 | 459 |
These are all pandas DataFrames. For more info on using pandas check out the pandas cookbook.
these are functions that I have started working on for some quick analyses of the data. The main one being doubling rates
the calculate_doublingtimes
function returns 2 columns.
dtime
is how many days our count has been doubling from the first reported case to get to todays case count
dtime_rw
is a rolling window calcualtion. So if you window is 6 days it looks at what our doubling rate, starting from the case count 6 days ago, would have to be to get to todays case count.
I started off looking at the rolling window calculation. However the more I look into it the more I'm not happy with using the rolling window. Our information about Covid-19 cases are changing so rapidly, that the rolling window calculation tends to be too noisy and too optimistic to be useful. We can calculate both below and see what they look like
totals_dt = covid_alberta.calculate_doublingtimes(ab_totals, col_suffix="cum_cases", combine_df=False)
regions_dt = covid_alberta.calculate_doublingtimes(ab_regions, col_suffix="cumulative", combine_df=False)
totals_dt.tail()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
dtime | dtime_rw | |
---|---|---|
2020-04-04 | 2.818897 | 7.119992 |
2020-04-05 | 2.897670 | 7.353586 |
2020-04-06 | 2.982973 | 9.613334 |
2020-04-07 | 3.059140 | 11.617191 |
2020-04-08 | 3.150442 | 17.176893 |
regions_dt.tail()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Calgary_dtime | Calgary_dtime_rw | Central_dtime | Central_dtime_rw | Edmont_dtime | Edmont_dtime_rw | North_dtime | North_dtime_rw | South_dtime | South_dtime_rw | Unknown_dtime | Unknown_dtime_rw | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2020-04-04 | 3.019693 | 7.296903 | 4.046714 | 14.735665 | 3.872364 | 5.864623 | 3.692514 | 8.141493 | 4.472769 | 7.609425 | 0 | 0 |
2020-04-05 | 3.110208 | 7.587349 | 4.151191 | 12.826571 | 3.956375 | 6.261873 | 3.807239 | 8.008629 | 4.553405 | 6.431655 | 0 | 0 |
2020-04-06 | 3.202070 | 9.970858 | 4.317239 | 18.637702 | 4.082834 | 8.636192 | 3.890285 | 7.959255 | 4.709120 | 6.000000 | 0 | 0 |
2020-04-07 | 3.286065 | 12.181763 | 4.376066 | 15.441420 | 4.189037 | 11.309771 | 3.966687 | 8.029614 | 4.863424 | 7.289318 | 0 | 0 |
2020-04-08 | 3.385243 | 19.656061 | 4.538143 | 20.885405 | 4.323639 | 15.835158 | 4.109679 | 9.387934 | 4.893159 | 8.566048 | 0 | 0 |
Here is some of the plots I've used for looking at the data. For this example I'm using matplotlib. Plotly creates nice plots but is a little harder to include in this documentation since it's hosted on github pages. If you head over to my website I'll post the plotly code and example of the interactive plots there.
import matplotlib.pyplot as plt
# Set defaults and settings
days_to_trim = 1
date_fmt = "%B %d"
# Grab the data we want for the plots and trim the last day off
plt_totals = ab_totals[:-days_to_trim]
plt_total_dt = totals_dt[:-days_to_trim]
plt_regions = ab_regions[:-days_to_trim]
plt_regions_dt = regions_dt[:-days_to_trim]
# use a format dictionary so I only have to set them in one location
fmt = {'alb': {'x_data': plt_totals['cum_cases'],
'y_data': plt_total_dt['dtime'],
'last_date': plt_totals.index.strftime(date_fmt)[-1],
'annot_x': plt_totals['cum_cases'][-1],
'annot_y': plt_total_dt['dtime'][-1],
'color': 'green',
'label': 'Alberta'},
'cal': {'x_data': plt_regions['Calgary_cumulative'],
'y_data': plt_regions_dt['Calgary_dtime'],
'last_date': plt_regions.index.strftime(date_fmt)[-1],
'annot_x': plt_regions['Calgary_cumulative'][-1],
'annot_y': plt_regions_dt['Calgary_dtime'][-1],
'color': 'orange',
'label': 'Calgary'},
'edm': {'x_data': plt_regions['Edmont_cumulative'],
'y_data': plt_regions_dt['Edmont_dtime'],
'last_date': plt_regions.index.strftime(date_fmt)[-1],
'annot_x': plt_regions['Edmont_cumulative'][-1],
'annot_y': plt_regions_dt['Edmont_dtime'][-1],
'color': 'blue',
'label': 'Edmonton'},
}
# Setup the plot
fig, ax = plt.subplots(figsize=(8,6))
# Create the scatter plots using a loop and the dictionary above
for rgn in ['alb', 'cal', 'edm']:
ax.plot(fmt[rgn]['x_data'], fmt[rgn]['y_data'],
c=fmt[rgn]['color'], label=fmt[rgn]['label'])
# add an annotation to the last point
for rgn in ['alb', 'cal', 'edm']:
ax.plot(fmt[rgn]['annot_x'], fmt[rgn]['annot_y'], 'o', c=fmt[rgn]['color'])
ax.text(fmt[rgn]['annot_x'] - 60, fmt[rgn]['annot_y'] + 0.08, fmt[rgn]['last_date'],
fontdict={'color': fmt[rgn]['color'], 'size': 8, 'weight': 'bold'})
# fancy up the plot
ax.grid(which='both', linestyle=(0, (5, 3)), lw=0.5)
ax.legend(frameon=True, fancybox=True, shadow=True)
ax.set_ylabel('Doubling Time (Days)', fontdict={'size': 9, 'family': 'sans-serif', 'style':'italic'})
ax.set_xlabel('Cumulative Case Count', fontdict={'size': 9, 'family': 'sans-serif', 'style':'italic'})
title = ax.set_title("Alberta: Doubling Time by Cumulative Cases",
fontdict={'fontsize': 10, 'family': 'sans-serif', 'fontweight': 'bold'})