-
Notifications
You must be signed in to change notification settings - Fork 0
/
Belgium_en.Rmd
884 lines (635 loc) · 43.7 KB
/
Belgium_en.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
---
title: "An analysis of excess mortality in 2020 in Belgium"
author: "Gael Fraiteur"
output:
html_document:
toc: yes
toc_depth: 2
fig_width: 9
fig_height: 6
---
```{r setup, include=FALSE}
source("Belgium.R", local = knitr::knit_global())
knitr::opts_chunk$set(echo = FALSE, message = FALSE)
this_theme <- function() {
theme_minimal() + theme(legend.position='bottom')
}
```
## Introduction
The COVID19 epidemic has caused disruptions of our societies that have not been seen in Western Europe since World Ware II. However, it is still difficult to
understand how exceptional this epidemic actually is compared to historic mortality data. Is it a once-in-a-century event, or does it repeat every ten years?
This article attempts to shed some light on these questions.
When discussing the effect of the COVID19 epidemic, Belgium is an interesting country to study for several reasons. First, it has
the highest reported death rate from COVID19 in the world. Secondly, Statbel, the Belgian governmental agency for statistics, publishes mortality
information every week with a delay of just 3 weeks. Last and not least, this is the country where the author grew up.
Our approach is to compare the number of deaths in 2020 (for weeks 1 to `r death_2020_max_week`) with the number of deaths that would be _expected_ based on
the structure of the population on January 1^st^, 2020 and on the usual death rates. The difference between expectations and reality
gives us a metric named _excess deaths_ when it is positive, or _deaths deficit_ when it is negative.
We compare this metric with the number of deaths due to COVID as reported by Sciensano, and with historical death and mortality data.
At the end of the article, we try to answer a few controversial questions based on data.
This article, as well as all R scripts that have been used to compute the data, are hosted on GitHub at https://github.com/gfraiteur/mortality.
All data comes from open sources and can be freely downloaded. If you have a question or remark related to this article,
or if you have found a bug or inaccuracy, please open an issue on GitHub or, better, submit a pull request.
## About the author
Gael Fraiteur graduated from the Louvain School of Engineering in 2001 as a civil engineer in applied mathematics. He also holds two minor degrees
in philosophy from UCLouvain. He has worked since then in the software industry. In 2004, he started an open-source project named PostSharp. In 2009, he founded a company
to market and develop the product. As the CEO and principal engineer of PostSharp Technologies, the author now shares his time between R&D, management and marketing.
## Data sources
The data were downloaded from the following sources:
* Number of deaths: Statbel,
[Number of deaths per day, sex, age, region, province, district](https://statbel.fgov.be/en/open-data/number-deaths-day-sex-district-age).
For 2020, this data is available until the `r death_2020_max_week`^th^ week.
* Structure of population: Statbel,
[Population by place of residence, nationality, marital status, age and sex](https://statbel.fgov.be/en/open-data?category=23)
* Cause of mortality: Statbel,
[Causes of death by month, sex, age group and region](https://statbel.fgov.be/en/open-data/causes-death-month-sex-age-group-and-region)
* COVID mortality: Sciensano, [Mortality by date, age, sex, and region](https://epistat.wiv-isp.be/covid/)
* General mortality: Human Mortality Database, [Death rates for Belgium, 1x1](https://www.mortality.org/cgi-bin/hmd/country.php?cntr=BEL&level=1)
* Temperatures: Royal Meteorological Institute of Belgium,
[Automatic weather station (AWS) daily observations](https://opendata.meteo.be/geonetwork/srv/eng/catalog.search;jsessionid=B123D8FF9D843F6B8721B8878EB55479#/metadata/RMI_DATASET_AWS_1DAY)
The script gives the exact URLs of the source data files we used, as well as the commands we have used to parse and prepare the dataset.
## Building a predictive model
If we want to compare the actual number of deaths in 2020 with what would be a "normal" number of deaths, we need to define what "normal" is.
Many authors have taken the average of the past 5 or 10 years as their point of comparison. However, this choice gives numbers that are
lower than what is realistic because the number of deaths has constantly increased of about 0.5% per year in the last 10 years. Comparing 2020 to 2015
would then cause a difference of 2,500 deaths, which is already significant.
```{r}
death_per_year <-
death %>%
filter ( year < 2020 ) %>%
group_by( year ) %>%
summarize( death_observed = sum( death_observed ))
yearly_death_increase <-
coefficients( lm( death_per_year$death_observed ~ death_per_year$year) )[2] /
death_per_year %>% summarise( avg_death = mean(death_observed) )
death_per_year %>%
ggplot( aes( x = year, y = death_observed)) +
geom_line(aes()) +
geom_smooth(method='lm', se = FALSE) +
ggtitle("Number of deaths per year") +
scale_x_continuous( breaks = 2009:2019,name = "year") +
scale_y_continuous( labels=function(x) format(x, big.mark = ",", scientific = FALSE), name = "deaths") +
this_theme()
```
Our model of yearly mortality is built in three steps:
1. The first input is the structure of the population the 1^st^ of January of each year between 2009 and 2020 (i.e. the number of residents of a given age and sex alive on that day).
2. The second input is the mortality data for Belgium for the corresponding age, sex and year, i.e. the probability that
a person who was alive on January 1^st^ morning would be dead on December 31^st^ evening. The expected number of deaths for a given group and year is simply the
product of the first input with the second input.
3. We split the yearly data into weeks based on weekly historic death data.
### 1. Structure of the population
The first input of our model is the structure of the population published by Statbel (see data sources). The data set gives us the number
of inhabitants of each sex who are alive and have a specific age on January 1^st^ of a given year. This data is available since 2009.
As you can see on the next graph, the number of people aged 65 years or more is increasing every year by approximately 34,000, a 1.6% growth.
```{r}
population_structure %>%
filter( age_group %in% c("85+", "75-84", "65-74")) %>%
group_by(age_group, year) %>%
summarise( population_count = sum(population_count) ) %>%
ggplot( aes( x = year, y = population_count, fill = age_group)) +
geom_area(aes()) +
scale_x_continuous(name = "year", breaks = 2009:2020) +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE), name = "inhabitants") +
this_theme() +
ggtitle("Number of inhabitants aged 65 or more")
population_older_65 <- population_structure %>%
filter( age_group %in% c("85+", "75-84", "65-74")) %>%
group_by(year) %>%
summarise( population_count = sum(population_count) )
# lm(population_older_65$population_count ~ population_older_65$year )
# population_older_65
```
### 2. Death rate
The _death rate_ for a given period is the probability that someone of a given group (i.e. given age group and sex) who is alive
at the beginning of the period would be alive at the end of the period. Unless stated otherwise, the period is a calendar year.
Let's look at the death rate for a few ages. As you can see, the death rate is continuously decreasing except for
the age group over 100. This graph also shows a greater variance of the death rate for older age groups, which seems
to mean that these age groups are more sensitive to events that happen less than once a year, such as epidemics
or exceptional weather.
```{r}
mortality_for_graph <- mortality %>%
filter( year >= 2009 & age %in% seq( 10, 100, 10) ) %>%
group_by( age, year ) %>%
summarise( mortality = mean(mortality)) %>%
mutate( is_old = age >= 50 )
mortality_for_graph$age <- as.factor(mortality_for_graph$age)
mortality_for_graph %>%
filter( is_old == TRUE ) %>%
ggplot( aes( x = year, y = mortality, color = age) ) +
geom_line() +
scale_color_discrete() +
scale_x_continuous(name = "year") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), name = "death rate") +
geom_smooth( method='lm', se = FALSE, linetype="dotted", weight=1) +
this_theme() +
ggtitle("Death rate for different ages over 50 in 2009-2018")
mortality_for_graph %>%
filter( is_old == FALSE ) %>%
ggplot( aes( x = year, y = mortality, color = age) ) +
geom_line() +
scale_color_discrete() +
scale_x_continuous(name = "year") +
scale_y_continuous(labels = scales::percent_format(accuracy = 0.01), name = "death rate", limits = c(0,NA)) +
geom_smooth( method='lm', se = FALSE, linetype="dotted", weight=1) +
this_theme() +
ggtitle("Death rate for different ages under 50 in 2009-2018")
```
We model the death rate with a linear regression for each age and sex. This model allows us to extrapolate
the data to 2019 and 2020, and removes the year-to-year variations for all previous years. That is, this
death rate model removes the effect of epidemics and weather conditions that happen less frequently than yearly.
Once we have a death rate model for each group and year, we multiply this coefficient by the actual
population for this group and year, which should give us the number of expected deaths. However, the expected and observed
number of deaths, summed from 2009 to 2019, don't match exactly. This discrepancy is expected and its cause is not important.
To cancel the discrepancy, we apply the following correction factors:
```{r}
pivot_wider(model_correction, names_from = sex, values_from = correction ) %>%
knitr::kable( col.names = c("Age group", "F", "M"), caption = "Correction factors", format = "markdown")
```
The following graphs shows the resulting projections of number of deaths per year and age group.
```{r}
# Graph: expected number of deaths per year and age group
death_expected_yearly %>%
group_by(year, age_group) %>%
summarise( death_expected_yearly = sum(death_expected_yearly)) %>%
ggplot( aes( x = year, y = death_expected_yearly, fill=age_group)) +
geom_area() +
ggtitle("Expected number of deaths per year") +
scale_x_continuous(name = "year", breaks = 2009:2020) +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE),
name = "deaths",
breaks = 10000 * (0:10), limits = c(0,NA)) +
this_theme() +
guides(fill = guide_legend(reverse=TRUE))
```
Let's now compare the projections with the reality:
```{r}
# Graph: expected number of deaths per year vs reality
death_expected_yearly %>%
filter(year <= 2019) %>%
group_by(year) %>%
summarise( death_expected_yearly = sum(death_expected_yearly), death_observed_yearly = sum(death_observed_yearly)) %>%
ggplot( aes(year, y = value)) +
geom_line(aes(y = death_expected_yearly, col = "expected")) +
geom_line(aes(y = death_observed_yearly, col = "real")) +
ggtitle("Expected vs observed number of deaths per year") +
scale_x_continuous(name = "year", breaks = 2009:2019) +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE)) +
this_theme()+ theme(legend.title = element_blank())
```
### 3. Splitting years into weeks
We now have a yearly model, but we need weekly projections. For our weekly model, we first compute, for each sex and age group, the percent of deaths
that happens in a given week of the year, and we multiply this coefficient with the yearly death rate for this
year. Note that we applied a 3-week centered rolling mean to the data series before aggregating per week of year.
```{r}
death_per_week_of_year %>%
group_by(week_of_year, age_group) %>%
summarise( week_death_percent = mean(week_death_percent) ) %>%
ggplot( aes( x = week_of_year, y = week_death_percent)) +
geom_line(aes(color=age_group)) +
ggtitle("Distribution of deaths per week") +
scale_x_continuous(name = "week of year") +
scale_y_continuous(labels = scales::percent_format(), name = "percent of deaths in this week") +
this_theme()
```
We can notice a higher seasonality of death rate for the more aged groups. For the younger groups, there is almost
no seasonal correlation but there is however more noise due to the lower amount of data points. We chose not to
address this problem as it should have only minor impact on the conclusions.
With this weekly model, we can finally compute the weekly predictive model and compare it with the reality:
```{r}
# Graph: expected number of deaths per week vs reality
death_expected_weekly %>%
group_by(date) %>%
summarise( death_expected = sum(death_expected), death_observed = sum(death_observed)) %>%
ungroup() %>%
filter(death_observed>1500) %>%
arrange(date ) %>%
ggplot( aes(x = date)) +
this_theme() + theme(legend.title = element_blank()) +
# geom_ribbon( aes(x = date, ymin = death_expected, ymax = death_observed, fill = death_expected >= death_observed, size =0),
# alpha = 0.4, show.legend = FALSE) +
geom_line(aes(y = death_expected, col = "expected")) +
geom_line(aes(y = death_observed, col = "real")) +
scale_x_date( date_breaks = "1 year", labels = date_format("%Y"), name = "year") +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE), name = "deaths per week", limits = c(0,5000)) +
ggtitle("Expected vs observed number of deaths per week")
```
## Excess death rate
_Excess deaths_ is the different between the actual number and the expected number of deaths. When there are fewer deaths
than expected, this is called _death deficit_.
The following graphs represents excess deaths from 2009:
```{r}
# Graph: excess death rate
death_expected_weekly %>%
group_by(date) %>%
summarise( excess_death = sum(excess_death)) %>%
ungroup() %>%
filter(excess_death>-500) %>%
ggplot( aes(x = date, y = excess_death)) +
geom_line( color = 'cyan3') +
this_theme() +
scale_x_date( date_breaks = "1 year", labels = date_format("%Y"), name = "year") +
ggtitle("Excess deaths per week")
```
## Cumulative excess mortality
Weekly excess mortality does not give a good insight of the impact of a specific _event_, such as an epidemic or a special weather condition,
over a long period of time. It does not answer the following questions:
* How many people died during a specific event?
* How long does it take to "recover" from the event?
To answer these questions, the cumulative excess mortality (i.e. the integral of the curve over time) is a better metric.
The following graphs show the cumulative death, first starting from January 1^st^, 2009, and then zooming in on 2019-2020.
```{r}
all_cum_death_expected_weekly <-
cum_death_expected_weekly %>%
group_by(date) %>%
summarise( excess_death = sum(excess_death), cum_excess_deaths = sum(cum_excess_deaths), year = mean(year)) %>%
ungroup()
all_cum_death_expected_weekly %>%
ggplot( aes(x = date)) +
geom_line(aes(y = excess_death, col = "weekly excess deaths")) +
geom_line(aes(y = cum_excess_deaths, col = "cumulative excess deaths")) +
scale_x_date( date_breaks = "1 year", labels = date_format("%Y"), name = "year") +
ggtitle("Cumulative excess deaths") +
this_theme() + theme(legend.title = element_blank()) +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE), name = "excess deaths", breaks = seq( -2000, 20000, 2000))
max_cum_death_deficit <-
(all_cum_death_expected_weekly %>%
filter(year>=2019) %>%
summarise( max_cum_death_deficit = min(cum_excess_deaths) ))$max_cum_death_deficit
max_cum_death_deficit_date <-
(all_cum_death_expected_weekly %>%
filter( cum_excess_deaths == max_cum_death_deficit ))$date
all_cum_death_expected_weekly %>%
filter(year>=2019) %>%
group_by(date) %>%
summarise( excess_death = sum(excess_death), cum_excess_deaths = sum(cum_excess_deaths)) %>%
ungroup() %>%
ggplot( aes(x = date)) +
geom_line(aes(y = excess_death, col = "weekly excess deaths")) +
geom_line(aes(y = cum_excess_deaths, col = "cumulative excess deaths")) +
geom_point(aes(x = max_cum_death_deficit_date, y = max_cum_death_deficit), color = "green") +
scale_x_date( date_breaks = "1 month", labels = date_format("%m"), name = "month") +
ggtitle("Cumulative excess deaths (2020)") +
this_theme() +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE), name = "excess deaths", breaks = seq( -2000, 20000, 2000)) +
theme(legend.title = element_blank())
```
Note that the initial value of the cumulative metric is arbitrary. Here, we chose that the zero point would be January 1^st^, 2009. Because
our model is based on a linear regression over years, and because of the yearly fluctuations, we can expect that the zero point is crossed
every second year.
The second arbitrary zero is the one of January 1^st^, 2020. This one is the consequence of the calibration of our model with real data for the period from 2009
to 2019. It is important to understand how arbitrary this zero is.
The cumulative maximal death deficit was observed on March 11th, 2020, with a value of `r abs(round(max_cum_death_deficit))` (point represented in green in the
graph above). So when comparing the expected cumulative mortality with the real cumulative mortality, a difference of `r abs(round(max_cum_death_deficit))` can
be obtained just by choosing a different beginning of the comparison time window.
### Excess death rate by age group
Taking this remark into account, we can now look at the increased death rate for the year 2020 (between weeks 1 and `r death_2020_max_week`) for different age
groups. The following table shows the difference in death rate during the studied period in absolute terms and in terms that are relative to the expected values
for 2020.
```{r}
excess_death_2020 %>%
select( age_group, sex, expected_mortality, observed_mortality, excess_mortality, excess_death_percent ) %>%
mutate( expected_mortality = label_percent(accuracy = 0.01 )( expected_mortality ) ,
observed_mortality = label_percent(accuracy = 0.01 )( observed_mortality ) ,
excess_mortality = label_percent(accuracy = 0.001 )( excess_mortality ) ,
excess_death_percent = label_percent(accuracy = 1 )( excess_death_percent ) ) %>%
knitr::kable( col.names = c("Age group", "Sex", "Expected death rate", "Actual death rate", "Absolute excess death rate", "Relative excess death rate"),
caption = sprintf( "Death rates by age group during week 1-%d of 2020", death_2020_max_week ),
format = "markdown",
format.args = list(decimal.mark = ",", big.mark = "'"))
```
The following graph shows the relative values only:
```{r}
# Graph: excess death rate in 2020 compared
ggplot(excess_death_2020, aes(x=age_group, y=excess_death_percent, fill=sex)) +
geom_bar(stat="identity", color="black", position=position_dodge())+
this_theme() +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), "excess death rate") +
ggtitle("Relative increase or decrease of death rate by age group and sex in 2020 compared to a normal")
excess_death_2020_by_age_group <-
excess_death_2020 %>%
group_by( age_group ) %>%
summarise( excess_death_percent = mean(excess_death_percent))
excess_death_2020_age_20 <- (excess_death_2020_by_age_group %>% filter( age_group == "0-24" )) $ excess_death_percent
excess_death_2020_age_84 <- (excess_death_2020_by_age_group %>% filter( age_group == "75-84" )) $ excess_death_percent
```
As you can see, the age group whose death rate is the most different in 2020 than the expectation is the younger one,
which had `r paste(abs(round(100*excess_death_2020_age_20)), "%", sep="")` less risk to die in 2020 than in another year.
We will come back to this number.
The age group whose death rate is the most negatively affected in 2020 is the 75-84 one, with a risk of dying
`r paste(abs(round(100*excess_death_2020_age_84)), "%", sep="")` higher than expected.
### Absolute death rate in historical context
To give some context, here is the historic death rate since 2008 compared to what was _expected_ for 2020. The dashed
horizontal lines show the _actual_ death rate in 2020. It can be seen on this graph that the death rates observed in 2020 would
be typical for year 2012.
```{r}
mortality_in_2020 <- mortality_lr_coefficients
mortality_in_2020$mortality_2020 <- mortality_in_2020$intercept + 2020 * mortality_in_2020$year
mortality_in_2020 <- mortality_in_2020 %>% select( -year, -intercept )
mortality_compared_2020 <-
merge( complete( mortality, sex, year, age ), mortality_in_2020, by = c("age", "sex" ))
mortality_compared_2020$compared_mortality = mortality_compared_2020$mortality / mortality_compared_2020$mortality_2020
mortality_compared_2020 <- na.omit( mortality_compared_2020 )
compared_mortality_for_graph <- mortality_compared_2020 %>%
group_by( age_group, year ) %>%
summarise( mortality_compared_2020 = mean(compared_mortality))
compared_mortality_for_graph_hline_85 <-
1+(excess_death_2020_by_age_group %>% filter( age_group == "85+"))$excess_death_percent
compared_mortality_for_graph_hline_75 <-
1+(excess_death_2020_by_age_group %>% filter( age_group == "75-84"))$excess_death_percent
compared_mortality_for_graph_hline_65 <-
1+(excess_death_2020_by_age_group %>% filter( age_group == "65-74"))$excess_death_percent
compared_mortality_for_graph %>%
filter( year >= 2008 & age_group != "0-24" ) %>%
ggplot( aes( x = year, y = mortality_compared_2020, color = age_group) ) +
geom_line() +
scale_color_discrete() +
scale_x_continuous(name = "year", limits = c(2008, 2020), breaks = 2*1000:1010) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), name = "death rate relatively to 2020", limits = c(0.9,1.5), breaks = seq(1,3,0.2) ) +
geom_hline( aes(yintercept = compared_mortality_for_graph_hline_85, color = "85+"), alpha = 0.5, linetype = "dashed" ) +
geom_hline( aes(yintercept = compared_mortality_for_graph_hline_75, color = "75-84"), alpha = 0.5, linetype = "dashed" ) +
geom_hline( aes(yintercept = compared_mortality_for_graph_hline_65, color = "65-74"), alpha = 0.5, linetype = "dashed" ) +
this_theme() +
ggtitle("Historic death rate relative to the expected death rate in 2020")
```
For more perspective, here the same graph starting from 1900 on a logarithmic vertical scale.
```{r}
compared_mortality_for_graph %>%
filter( year >= 1900 ) %>%
ggplot( aes( x = year, y = mortality_compared_2020, color = age_group) ) +
geom_point( size = 1 ) +
scale_color_discrete() +
scale_x_continuous(name = "year", breaks = seq(1900,2020,10)) +
scale_y_log10(labels = scales::percent_format(accuracy=1), name = "death rate relatively to 2020 normal", breaks = 10 ^ seq(0,2,0.2), limit = c(1,100)) +
this_theme() +
ggtitle("Historic death rate relative to the expected death rate in 2020")
```
### Death rate variation in historical context
We have seen that the death rate in 2020 was typical for 2012. How can we then explain the framing of the
epidemic, by the government and media, as a catastrophe of unseen since Word War II or even the Spanish flu?
Looking at the _year-to-year variation_ of death rate instead of the death rate itself gives some hints.
How far should we look in the past to see a death rate that is, in absolute terms, 1% to 2% higher than normal?
To answer this question, we will use the death rate data of the Human Mortality Database since 1900. We will
define "normal" as being a 5-year rolling median. The median does a better job than the mean to eliminate exceptional events like epidemics.
The following graph shows only strictly positive deviations and the vertical axis is logarithmic, with horizontal
lines for 0.2%, 1% and 2%, the approximate values of excess death rate for the three oldest age groups.
```{r}
mortality_deviation_min_year <- 1900
mortality_deviation_max_year <- 2020
mortality_deviation %>%
filter(year>=mortality_deviation_min_year & deviation > 0 ) %>%
ggplot( aes( x = year, y = deviation, color = age_group ) ) +
geom_point(aes( shape = sex ), size = 1 ) +
scale_y_log10(labels = scales::percent_format(accuracy=0.001), name = "deviation", breaks = c( 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 1e1)) +
scale_x_continuous(name = "year", limits = c(mortality_deviation_min_year, mortality_deviation_max_year),
breaks = seq(mortality_deviation_min_year, mortality_deviation_max_year, 10), minor_breaks = NULL ) +
this_theme() +
geom_hline( yintercept = 0.02, color = "blue", alpha = 0.4, linetype = "dashed" ) +
geom_hline( yintercept = 0.01, color = "blue", alpha = 0.4, linetype = "dashed" ) +
geom_hline( yintercept = 0.002, color = "blue", alpha = 0.4, linetype = "dashed" ) +
ggtitle("Relative deviation of the yearly death rate from the 5-year rolling median death rate")
```
The next graph aggregates the data by decade and shows the maximum deviation of the yearly death rate from the median death rate during the whole decade,
again with horizontal lines for 0.2%, 1% and 2%.
```{r warning=FALSE}
mortality_deviation %>%
filter(year>=mortality_deviation_min_year ) %>%
mutate( year_rounded = year - ( year %% 10) + 5) %>%
group_by( age_group, year_rounded ) %>%
summarise( max_deviation = max(deviation)) %>%
ggplot( aes( x = year_rounded, y = max_deviation, color = age_group ) ) +
geom_line(aes( )) +
scale_y_log10(labels = scales::percent_format(accuracy=0.001), name = "deviation",
limit = c(1e-5,1),
breaks = c( 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1)) +
scale_x_continuous(name = "year", limits = c(mortality_deviation_min_year, 2020),
breaks = seq(mortality_deviation_min_year, 2020, 10), minor_breaks = NULL ) +
this_theme() +
geom_hline( yintercept = 0.02, color = "blue", alpha = 0.4, linetype = "dashed" ) +
geom_hline( yintercept = 0.01, color = "blue", alpha = 0.4, linetype = "dashed" ) +
geom_hline( yintercept = 0.002, color = "blue", alpha = 0.4, linetype = "dashed" ) +
ggtitle("Maximal deviation the yearly death rate from the 5-year rolling median")
```
It is clear from these 2 graphs that a 2% deviation is usual and even on the small end for the age group over 85 years.
However, for the groups 65-74 and 75-84, we have to go back to the 1960s to see a similar deviation.
### Compared death rate: conclusion
How exceptional is 2020 to other years? The conclusion is paradoxical.
If we look at the absolute death rates, 2020 is similar to 2012. If we look at the excess death rate for the 85+ age group, we also find numbers
that would be usual around 2010. We could then say that 2020 is a one-in-ten-years occurrence, which would make it difficult to justify the
extraordinary restrictions on freedom of movement and gathering.
However, the real originality of 2020 from the point of view of historic death data is not the absolute death rate, not even the excess death rate
for the oldest age group, but the _excess death rate of the 64-85 age group_, for which there is no equivalent since the 1960s.
This paradox brings light to the polarized discussion between those who believe that the epidemic is a normal event that
happens every 10 years (it is, in terms of absolute death rate) and those who believe there was no worse event since World War II
(this is true but the 1960s were not far).
## Comparison with reported COVID deaths
The following graph compares the number of weekly excess deaths with the number of deaths attributed to COVID19, as
reported by Sciensano.
```{r}
compared_covid_death %>%
group_by( date ) %>%
summarise( covid_death = sum(covid_death), excess_death = sum(excess_death) ) %>%
ggplot( aes(x = date)) +
geom_line(aes(y = covid_death, col = "covid deaths")) +
geom_line(aes(y = excess_death, col = "excess deaths")) +
geom_hline(yintercept = 0) +
scale_x_date( labels = date_format("%m-%y"), date_breaks = "1 month", name = "date") +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE), name = "deaths per week") +
ggtitle("Weekly COVID19 deaths vs excess deaths") +
this_theme() +
theme(legend.title = element_blank())
```
We can see that the number of deaths attributed to COVID19 by Sciensano copies quite well the number of excess deaths
estimated by our model, at least during the two epidemic peaks. The most notable exception is during the deaths
peak of August. The second difference that needs to be discussed is the death deficit after the spring peak.
### Death peak in August
```{r}
heat_death_2020 <- round(
(death_expected_weekly %>%
filter( date >= as.Date("2020-08-01") & date < as.Date("2020-08-31") ) %>%
summarise( excess_death = sum(excess_death) ))$excess_death )
heat_death_2019 <- round(
(death_expected_weekly %>%
filter( date >= as.Date("2019-07-01") & date < as.Date("2019-07-31") ) %>%
summarise( excess_death = sum(excess_death) ))$excess_death )
```
The number excess deaths during the heat wave of August 2020 was `r heat_death_2020`. By comparison, the
heat wave of July 2019 caused an excess death of `r heat_death_2019`: 10 times less.
Can this be explained by exceptional temperatures? The following graphs shows the history of maximum daily temperatures in Belgium as well of a 7-day and a
30-day right-aligned moving average of this metric.
```{r warning=FALSE}
weather %>%
filter( date >= as.Date("2018-06-01") ) %>%
group_by( date ) %>%
summarise( air_temperature = max(air_temperature_max) ) %>%
ungroup() %>%
ggplot( aes( x = date, y = air_temperature ) ) +
geom_point(aes(),alpha=0.1, size = 1 )+
geom_line(aes(y=rollmeanr(air_temperature, 7, na.pad=TRUE)),color = "blue")+
geom_line(aes(y=rollmeanr(air_temperature, 30, na.pad=TRUE)),color = "red")+
ggtitle("Maximum daily temperatures and 7-day and 30-day moving averages") +
scale_x_date( labels = date_format("%m-%y"), date_breaks = "2 month",
date_minor_breaks = "1 month", limits = c(as.Date("2018-06-01"),as.Date("2020-12-31") ) )
```
As we can see, the maximal weekly temperatures were not exceptional compared to 2019 or even 2018. Therefore, temperature alone cannot
be a sufficient explanation of the peek of deaths in August 2020. Other factors need to be considered, for instance a generally
weaker condition of the population due to the epidemic or to the lockdown itself.
### Death deficit between death peeks
The second difference between the COVID19 death curve and the excess death curve is that the second shows a death deficit
after the first epidemic peak. This means that a part of people who died during the epidemic peak would have died anyway a couple
of weeks later.
This is more visible on the cumulative excess deaths graph
```{r}
all_cum_death_expected_weekly %>%
filter(date>=as.Date("2020-03-01")) %>%
group_by(date) %>%
summarise( excess_death = sum(excess_death), cum_excess_deaths = sum(cum_excess_deaths)) %>%
ungroup() %>%
ggplot( aes(x = date)) +
geom_line(aes(y = excess_death, col = "weekly excess deaths"), linetype="dashed") +
geom_line(aes(y = cum_excess_deaths, col = "cumulative excess deaths")) +
scale_x_date( date_breaks = "1 month", labels = date_format("%m"), name = "month") +
ggtitle("Cumulative excess deaths (2020)") +
this_theme() +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE), name = "excess deaths") +
theme(legend.title = element_blank())
```
```{r}
death_excess_summer <-
all_cum_death_expected_weekly %>%
filter( date >= as.Date("2020-05-01") & date < "2020-08-01") %>%
summarise( min_cum_excess_deaths = min(cum_excess_deaths), max_cum_excess_deaths = max(cum_excess_deaths))
death_excess_spring <-
all_cum_death_expected_weekly %>%
filter( date >= as.Date("2020-03-01") & date < "2020-06-01") %>%
summarise( min_cum_excess_deaths = min(cum_excess_deaths), max_cum_excess_deaths = max(cum_excess_deaths))
net_death_excess_spring <- round(death_excess_spring$max_cum_excess_deaths - death_excess_spring$min_cum_excess_deaths)
net_death_excess_summer <- round(death_excess_summer$max_cum_excess_deaths - death_excess_summer$min_cum_excess_deaths)
```
During the spring peak, the cumulative excess death was `r net_death_excess_spring`. During the summer, before the August peak, the death deficit was
`r net_death_excess_summer`, that is, approximately `r paste(abs(round(100*net_death_excess_summer/net_death_excess_spring)), "%", sep="")`
of people who died during of COVID19 during spring had their life shortened by less than 3 months.
### Comparision of number of deaths by age group
The following chart compares the number of excess deaths in 2020 with the number of COVID19 deaths.
```{r}
compared_covid_death_by_age_group <-
compared_covid_death %>%
group_by( age_group ) %>%
summarise( covid_death = sum(covid_death), excess_death = sum(excess_death))
compared_covid_death_age_group_85 <-
compared_covid_death_by_age_group %>% filter(age_group == "85+")
reshape2::melt(compared_covid_death_by_age_group,id=("age_group")) %>%
ggplot( aes(x=age_group, y=value, fill=variable)) +
geom_bar(stat='identity', position='dodge') +
scale_y_continuous(labels=function(x) format(x, big.mark = ",", scientific = FALSE), name = "deaths") +
scale_x_discrete((name='age group')) +
scale_fill_discrete(labels=c('COVID deaths', 'excess deaths') ) +
ggtitle("COVID mortality vs excess mortality per age group") +
theme(legend.title = element_blank())
```
The numbers are quite similar, except for the oldest age group where COVID19 deaths are
`r round(compared_covid_death_age_group_85$covid_death - compared_covid_death_age_group_85$excess_death)`
(`r paste(abs(round(100-100*compared_covid_death_age_group_85$covid_death/compared_covid_death_age_group_85$excess_death)), "%", sep="")`)
higher than excess deaths. This means than an approximate 20% of the people over 85 who died of COVID19 during 2020 (or were diagnosed as such)
would have died from another cause in the same period
## Decreased death rate of the youngest age group
The death rate of the 0-24 age group is `r paste(abs(round(100*excess_death_2020_age_20)), "%", sep="")` lower in 2020 than expected. How
can we explain this significant difference? Intuitively, this could be explained by the lower exposure to risk of accidents. Let's test
this hypothesis based on the _Causes of death by month, sex, age group and region_ data set published by Statbel (see data sources).
The cause code V01-Y98 includes all deaths by external causes, including accidents and homicides. This data is available from 2009 to 2019.
Let's visualize the percent of accidental deaths by age group and sex:
```{r}
accidental_death %>%
filter( year == 2017 ) %>%
summarise( percent_accidental_death = sum( accidental_death_observed ) / sum(all_death_observed)) %>%
ggplot( aes( x = age_group, y = percent_accidental_death, fill = sex )) +
geom_bar(position = "dodge", stat="identity") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), name = "percent of deaths from external causes", limit = c(0,NA)) +
ggtitle("Percent of deaths from external causes") +
this_theme()
```
```{r}
accidental_death %>%
filter( age_group == "0-24") %>%
ggplot( aes( x = year, y = percent_accidental_death, color = sex )) +
geom_line() +
ggtitle("Percent of deaths from external causes in the 0-24 age group") +
this_theme() +
scale_x_continuous(name = "year", limits = c(2009, 2020), breaks = 2009:2020, minor_breaks = NULL ) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), name = "percent of deaths from external causes", limit = c(0,NA)) +
geom_smooth( method = lm, se=FALSE, linetype="dotted")
```
If we extrapolate this data linearly to 2020, we get a percent of accidental deaths of
`r paste(round(100*accidental_death_2020[accidental_death_2020$sex=='M',"percent_accidental_death"]), "%", sep="") ` for males and
`r paste(round(100*accidental_death_2020[accidental_death_2020$sex=='F',"percent_accidental_death"]), "%", sep="") ` for females
This shows that we cannot explain the decrease of the death rate of the youth in 2020 solely by the decrease
of accidental mortality caused by the lockdowns. For males, it would mean that there would be almost deaths
by accident during the whole period (not just during the lockdowns). For females, accidental death is
smaller than the mortality drop.
Therefore, other factors have to be considered to explain this drop, and the data we analyzed does not provide
any useful hint.
Because of this, it is also useless to try to interpret the death rate in 2020 as a negative factor to the
the virus itself, and a positive factor due to the restrictions of movements and activities. Both factors
probably existed, but the data available now do not allow to quantify them.
## Revisiting controversial questions
This article allows to answer a few polemic questions that rose around the COVID19 epidemic and the response of the
government and mainstream media:
* **Is the number of deaths reported by Sciensano exaggerated, either by biased diagnostic methods or deliberately? **
**No**, there is no indication that the death data reported by Sciensano are incorrect.
* **Would a majority of people who died from COVID19 have died from another cause very soon?**
**No**. Approximately `r paste(abs(round(100*net_death_excess_summer/net_death_excess_spring)), "%", sep="")`
of people who died during of COVID19 would have died less than 3 months from other causes. Based on death data,
we cannot take any conclusion for a longer time period or for the people who died in autumn.
* **Has the restrictions of freedom of movement impacted the death rate?**
**Yes**, the death rate on the 0-24 age group has decreased by 35% relatively to normal in 2020. However this decrease cannot be explained by a
decrease of accidental mortality alone.
* **How exceptional was year 2020 in terms of mortality?**
By definition, an _exception_ is an event that does not occur frequently. Therefore, to assess the exceptionally of an event, we
to find a comparable precedent.
Whether we look at the _absolute_ or _excess_ death rate, we get a different level of exceptionally.
**Once in 50 years** if you look at excess death rate for the age group from 65 to 74 years, i.e. the risk of dying in 2020 compared to expectations.
There is no equivalent of the excess death rate since the 1960s.
**Once in 10 years** if you look at absolute death rate number. The absolute death rates of 2020 are close to the ones of 2010. That is, if there
were in 2010 the same number of deaths than in 2020 for the same population structure, this would be a normal year.
2020 is _at most_ 20% worse than a normal year. For people in the most affected age group, the risk to dying of coronavirus in 2020 (_additionally_ to
normal risks), was 20% than the risk of dying from another reason.
This article also lets a few questions unanswered:
* **How to explain the 35% decrease of mortality in the 0-24 age group?**
A decrease of accidental mortality is not a sufficient explanation. This mortality deficit should be modeled and applied to the 25-45 age group,
which is typically also subject to higher accidental death. This would result in a higher death rate attributed to COVID19 in this age group.
The data analyzed here did not allow us to perform such analysis.
* **How to explain the increased mortality during August 2020?**
The mortality was 10 times higher during the heat peak of July 2019 than during the one of July 2019. Temperatures
were not significantly higher, therefore another explanation is necessary.
Probably the most important question that article _cannot_ answer is: **What would have been the mortality _without_ the lockdowns?** Analyzing
past mortality data in only one country certainly cannot answer this question. Lockdowns do not heal, but they help decreasing the mortality
in two ways: first by avoiding the overload of the health system, secondly because the knowledge of the virus and its cures increase overtime
## Conclusion
This article challenges the believe that the 85+ age group is the most affected by the COVID19 epidemic.
In terms of absolute mortality. However you look at the _relative_ increase of the death rate, the 85+ age group is twice less affected than the 75-74 one.
Perhaps the most striking conclusion is the paradox between the fact that, on one side,
the death rate in 2020 was equivalent to the one around 2010 and, on the other side, there has not been a comparable
year-to-year variation of death rate for the 65-84 age group until the 1960s.
What happened that a death rate by age group that would have been normal in 2010 now causes major disruptions of our societies?
An element of answer is that disruptions were imposed, not based on real mortality numbers, but based on _expectations_ of mortality in order
to _prevent_ higher mortality. It would be more fair to judge the efficiency of political action by the _avoided_ mortality rather than by the actual mortality,
which means that we would need to compare actual mortality with expected mortality for the COVID-19 epidemic in Belgium. It will not be possible
to evaluate the impact of the lockdowns until the epidemic is completely finished. Until such analysis is possible, we can only reason based
on actual mortality.
Historic data clearly shows both the death rate and the year-to-year deviation from the expected death rate have constantly decreased since the 1950s.
This constant reduction of risk may have major sociological, psychological and political implications. Rulers and politicians alike have instrumentalized fear
for centuries in order to control populations, and instrumentalized victims in order to justify their oppressive policies and consolidation of power. With an increasingly lower
tolerance to the risk of dying, the political opportunity to instrument epidemics is also growing. This effect is amplified by the simple fact that the number
of older voters is increasing and politicians have a record of implementing short-sighted strategies just to win more votes.
We have seen that 2020 can be compared to 2010 in terms of absolute mortality. We should also compare 2020 to past years using
other criteria: quality of life, actuation of civic rights, possibility to die in dignity, social justice, respect of the State of Law,
freedom of speech, ... When was the last time when we saw similar regressions in Belgium? We would probably need to look back to 1940-1945.
Are we willing to live in wartime conditions to maintain a mortality under what would be acceptable just one or two decades ago? What it the
acceptable risk? When should we stop accepting restrictions to live in the name of reducing risk?
Once the COVID-19 epidemic will belong to the past, once will be able to estimate the real costs and benefits of the lockdowns, these questions will become unavoidable.
It has been repeated _ad nauseam_ that the economics must not prevail over human life. This oversimplified moral imperative has been used to justify the sacrifice
of small businesses, students and other segments of the population, or the air transportation industry. This motto must be questioned. At least, the media should return
it to a few big companies in industries like food, oil, tobacco or pharmacy -- to see if really still applies to everybody.
The real question of 2020 is rather: should mere biological survival prevail over the possibility a fully human life, a human
life with friends, family, sports, politics, arts, religion, ... all the facets that make life really human.
## Related works
* Scientific Directorate of Epidemiology and public health, [All-cause mortality supports the COVID-19 mortality in Belgium and comparison with major fatal events of the last century](https://archpublichealth.biomedcentral.com/articles/10.1186/s13690-020-00496-x) (covers only March-June 2020)