-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDeterminants of Economic Growth.Rmd
1074 lines (622 loc) · 68.8 KB
/
Determinants of Economic Growth.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Determinants of Economic Growth"
author: "Johann Power"
date: "12/08/2021"
output:
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Introduction
The determinants of economic growth have long been debated in economics and are important to understand in helping improve people’s standards of living across the globe. As Lucas stated: “How did the world economy of today, with its vast differences in income levels and growth rates, emerge from the world of two centuries ago, in which the richest and the poorest societies had income differing by perhaps a factor of two, and in which no society had ever enjoyed sustained growth in living standards?” (Lucas, 2000). The Clark-Fisher three-sector model describes how economies change with their level of development. Least developed countries focus on primary sector activities while more developed countries employ more people in tertiary and quaternary services. These sectoral shifts may explain why different economic growth models better suit different countries based on their level of economic development. For instance, the Malthusian model which described output as a function of technology, land and population based on Malthus (Malthus, 1798) explains well pre-industrial growth. Whereas the original Solow-Swan (Solow, 1956) neoclassical growth model incorporated the role of physical capital and fits the growth patterns observed in industrializing countries like that which occurred in the ‘Asian Tiger’ economies during 1960s but fails to explain why productivity was lower in poorer countries and the slow movement of capital from developed to developing nations. This led to growth models which also incorporated the role of human capital (Romer et al., 1992) and social infrastructure (Hall and Jones, 1999) which better explained growth in higher income countries who had higher levels of human capital and social infrastructure. Therefore, with economic development, the main determinants of long run economic growth tend to change. Without, as of yet, a universal economic growth model with widespread consensus it makes more sense to look at the determinants of economic growth for groups of countries with similar levels of economic development as their economies are, on average, structurally more similar. This paper assigns a country’s level of development into one of four categories based on their average ranking in the Economic Complexity Index (ECI) during the dataset period: most developed countries, developed countries, less developed countries and least developed countries.
Fixed effects multiple linear regression analysis is then applied to each group of countries to quantify the associated impact of economic growth determinants on output in structurally similar economies. This will also allow us to observe the relative importance of different economic growth factors as countries develop. This is a fundamentally empirically led study and as such the results of this paper should not be interpreted causally. As, like all statistical techniques based on correlation, correlation does not imply causation. Nevertheless, this paper aims to guide public policy or further research by highlighting which growth determinants to focus on, depending on a country’s development level, and identify any interesting findings which either support or undermine existing growth models.
# Background
There have been many empirical studies on the determinants of economic growth which usually focus on finding evidence to support an established economic growth model. For instance, Romer et al. (1992) provide cross-country evidence to support a labour-augmented Solow model in explaining economic growth. While Barossi-Filho et al. (2005) find updated evidence to support the predictions of the original Solow model. Papers based on the grounds of a pre-established theoretical model can be more compelling and more likely to imply causation. However, they often restrict themselves to analyzing only a few determinants of economic growth. Economies constantly evolve and with them growth models have also evolved in their success at explaining economic growth. To avoid only focusing on growth models which may only suit a certain set of countries in a certain era, this paper takes a broader approach by incorporating multiple possible growth determinants. Much attention in the mainstream literature pays attention to growth determinants popularized by the (labour-augmented) Solow-Swan model like human and physical capital but less focus is given to the role of productivity factors, in particular the individual role of social and physical productivity. Johansson and Tretow (2015) present one such paper which analyses the role of economic freedom (what this paper would classify as a social productivity factor) on economic growth using multiple linear regression techniques. This paper aims to employ a similar analysis using multiple linear regression but instead also including fixed country and time effects and adds to the literature by using a unique combination of datasets which have never been used together before to assess the impact of economic growth determinants across a wide range of structurally different economies. In addition, rather than focusing on all countries or just developed and developing countries, countries are more finely categorized into four levels of development as the structure of economies can vary significantly with economic development as Fisher (1935) first highlighted. Cross country growth regressions have been rightly criticized for often wrongly inferring causal relationships from their results (Durlauf, 2009). This is because regressions on variables which are endogenous (as is the case in neoclassical growth theory) invalidates any causal claims as for instance reverse causality will impact the results. Instead, this study is an empirical analysis to highlight any interesting trends and assess whether they support or do not support existing growth models. The results of this paper aim to guide public policy and generate new statistical relationships to help future researchers in formulating new and improved economic growth models.
# Economic Growth Model
The model for economic growth used in this paper is one which contains the main factors of production used across most mainstream economic growth models.
Formally:
$$y=f(k,h,z_p,z_s)$$
where:
$y$ = output per capita
$k$ = physical capital per capita
$h$ = huamn capital per capita
$z_p$ = Physical productivity
$z_s$ = Social productivity
One major difference between this growth model and all other conventional models is that total factor productivity (TFP) is split into two components: physical and social productivity. This is done so to observe each individual effect and together the datasets used to proxy each productivity factor roughly approximate the theoretically implied TFP level in an economy derived via growth accounting techniques (shown in Appendix I). One important growth factor that is not included is land as the data available for this study is not long enough to observe any significant shift in land endowments for most countries. In addition, land varies significantly in productivity and the extent to which it contains valuable natural resources. As such, this variable is left out of the economic growth model, but further studies could be carried out to assess the impact of different types and endowments of land in their contribution to economic growth.
# Data Used
In total, this study uses annual data for each of the model’s growth determinants for 78 countries for 13 years between 2007 – 2019.
## Output per Capita
GDP per capita is used as a measure of output per capita and is sourced from the World Bank. GDP per capita figures are measured in current US dollars with data available for the full dataset period from 2007-2019. The World Bank calculates GDP as the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products.
## Physical Capital per Capita
Gross fixed capital formation sourced from the World Bank from 2007-2019 in current US dollars is used as a measure of physical capital per capita.
## Human Capital per Capita
The World Bank Human Capital Index (HCI) is used as a measure of human capital per capita in a country. Kraay (2018) presents the methodology used in calculating the HCI but essentially it is a measure which assesses how well a country performs in the following attributes:
#### 1. Survival
• Share of children surviving past the age of 5 in %
#### 2. School
• Quantity of education (Expected years of schooling by age 18)
• Quality of education (Harmonized test scores)
#### 3. Health
• Adult survival rates (Share of 15-year-olds who survive until age 60 in %)
• Healthy growth among children (Stunting rates of children under 5 in %)
Some limitations of the HCI are that the data for some years is missing during the dataset period. These missing data points have been interpolated via linear regression on the original available data. Other alternative datasets to measure human capital include the Barro-Lee educational attainment dataset which is a commonly used proxy. However, this dataset only has data available at best every 5 years which is less data available than the HCI over the same period and the HCI is a more holistic measure of human capital.
## Physical Productivity
This paper refers to physical productivity as the technology which contributes to productive efficiency. For example, this includes more efficient machines in a production plant or better software and machine learning capabilities to automate parts of the production process. To assess the level of physical productivity in a country this paper uses the ICT Development Index (IDI) published by the International Telecommunications Union of the United Nations as it gives an approximation of the technology level within a country and offers useful comparisons between countries. The IDI uses 11 internationally agreed indicators to measure the developments in information and communication technology (ICT) between countries and over time. These indicators are grouped into three sections: access, use and skills and are presented below:
#### Access to ICT
1. Fixed telephone subscriptions per 100 inhabitants
2. Mobile phone subscriptions per 100 inhabitants
3. International Internet bandwidth (bits / s) per
#### Internet user
4. Percentage of households with a computer
5. Percentage of households with Internet access
#### Use of ICT
6. Percentage of people using the Internet
7. Fixed broadband subscriptions per 100 inhabitants
8. Mobile broadband subscriptions per 100 inhabitants
#### ICT Skills
9. Adult literacy rate
10. Gross secondary school enrollment rate
11. Gross rate of higher education
Some limitations of the IDI are that data is missing for some years in the dataset period. These missing data points have been interpolated via linear regression on the original available data. In addition, this index is not a true measure of all physical productivity factors in an economy as that would be impossible to capture. However, like the other indices used in this paper, it offers a good estimate and makes for useful comparison across countries and time.
## Social Productivity
This paper refers to social productivity as the social, political or cultural elements which contribute to productive efficiency. For example, this could include the (work) culture, rule of law, role of institutions etc. in a country which contribute to how efficiently a country can produce output. The Economic Freedom Index (EFI) published by The Heritage Foundation and The Wall Street Journal is used as a measure of social productivity. The EFI measures a country’s degree of economic freedom based on 12 qualitative and quantitative factors grouped into four main categories: rule of law, government size, regulatory efficiency and open market.
#### Rule of Law
1. Property rights
2. Government integrity
3. Judicial effectiveness
#### Government Size
4. Government spending
5. Tax burden
6. Fiscal health
#### Regulatory Efficiency
7. Business freedom
8. Labour freedom
9. Monetary freedom
#### Open Markets
10. Trade freedom
11. Investment freedom
12. Financial Freedom
Each of these 12 indicators are graded on a scale of 0 – 100 and then averaged out with equal weights to obtain the final EFI score.
Using growth accounting techniques, we observe that together the proxies used for physical and social productivity are good proxies for total factor productivity and across all countries they overestimate the theoretical TFP by only 2.5% as shown in Appendix A.
## Country Development Level Groupings
This paper splits countries into 4 categories for level of development based on their average ranking in the Economic Complexity Index (ECI) during the dataset period: most developed countries, developed countries, less developed countries and least developed countries. The ECI ranks countries based on how diversified and complex their export basket is. Countries that have a great diversity of productive know-how, particularly complex specialized know-how, can produce a great diversity of sophisticated products. This is a better measure of a country’s level of development than usual proxies like GDP per capita as it more accurately captures the productive potential and level of advancement of an economy which is the aim of long run economic growth models. The ECI nevertheless has a strong correlation with GDP per capita rates, is found to highly predict current income levels and can predict faster future economic growth if current economic complexity exceeds expectations for a country’s income level. Therefore, the ECI is a useful measure of economic development which this paper adopts.
The following classification is used:
• Most developed countries: have a mean ECI ranking over dataset period less than or equal to 22
• Developed countries: have a mean ECI ranking over dataset period greater than 22 and less than or equal to 43
• Less developed countries: have a mean ECI ranking over dataset period greater than 43 and less than or equal to 85
• Least developed countries: have a mean ECI ranking over dataset period greater than 85
## Data Cleaning
To analyse these datasets in R we first need to load all the necessary packages and files and clean all our data so that it is ready to be statistically analysed.
### Load Packages
The following packages are used:
```{r}
library(readxl)
library(regclass)
library(plm)
library(dplyr)
library(tidyverse)
library(haven)
library(ggplot2)
library(readr)
library(data.table)
library(magrittr)
library(knitr)
library(rmarkdown)
library(reshape2)
library(car)
library(janitor)
library(corrplot)
library(ggcorrplot)
library(lmtest)
library(gvlma)
library(sandwich)
library(fixest)
library(kableExtra)
```
### Load Files
We now load our raw data files:
```{r, echo = TRUE, eval = TRUE}
country_data <- read_csv("C:\\Users\\User\\Documents\\Uni\\Exchange Year - Sciences Po\\LT\\Econometrics\\Final Project\\Country Data\\final_country_data2.csv")
EFI_data <- read_csv("C:\\Users\\User\\Documents\\Uni\\Exchange Year - Sciences Po\\LT\\Econometrics\\Final Project\\Index Data\\Final Data\\Final_EFI_2.csv")
HCI_data <- read_csv("C:\\Users\\User\\Documents\\Uni\\Exchange Year - Sciences Po\\LT\\Econometrics\\Final Project\\Index Data\\Final Data\\Final_HCI_2.csv")
IDI_data <- read_csv("C:\\Users\\User\\Documents\\Uni\\Exchange Year - Sciences Po\\LT\\Econometrics\\Final Project\\Index Data\\Final Data\\Final_IDI_2.csv")
```
As most of our growth determinant variables are in separate files, we need to merge all these datasets into one.
```{r}
merged1 = left_join(country_data, EFI_data)
merged2 = left_join(merged1, HCI_data)
countries = left_join(merged2, IDI_data)
```
We also remove scientific notation for convenience.
```{r}
options(scipen = 999)
```
After merging the files, we check whether all the data we gathered has been downloaded and merged correctly. As in our original data we had 79 countries with 13 years of data, this should mean we have 1,027 rows (as 13 ×79=1027).
```{r}
nrow(countries)
```
This equals 1027 as expected.
We also need to check for missing values but as missing values downloaded from the World Bank are usually denoted as ‘..’ rather than ‘NA’ we need to convert any such values to ‘NA’ before using the sum(is.na()) function.
```{r}
countries$'investment' <- as.numeric(ifelse(countries$'investment' =='..', NA, countries$'investment'))
sum(is.na(countries$investment))
sum(is.na(countries))
```
By comparing the total missing values and the missing values in the investment column we can see that all the missing values are in the investment column. Let’s now remove countries with missing data.
```{r, results='asis'}
missing_values <- countries[rowSums(is.na(countries)) > 0,]
kable(missing_values, caption = 'Countries with missing data') %>% kable_styling()
```
So, our missing values were caused by missing investment data for Qatar. We thus remove Qatar from the dataset.
```{r}
countries_cleaned = countries %>% filter(country != "Qatar")
sum(is.na(countries_cleaned))
```
Now we no longer have any missing values.
```{r}
nrow(countries_cleaned)
```
The number of rows is now 1014 as expected as we lost one country’s worth of data which is 13 years. Thus, 1027-13=1014.
We also order the data by year and not country.
```{r}
countries_cleaned = countries_cleaned[order(countries_cleaned$year),]
```
## Alter variables
Some data cleaning to have physical capital in per capita form and remove unnecessary variables.
```{r}
countries_cleaned = countries_cleaned %>% mutate(investment = investment/total_population) %>% select(-working_age_population) %>% select(-total_population)
```
Thus, our final cleaned data file containing all the data we will use for our analysis is in the ‘countries_cleaned’ dataframe.
```{r results='asis'}
kable(countries_cleaned) %>% kable_paper() %>% scroll_box(width = "800px", height = "400px") %>% kable_styling()
```
## Explore the Data
We will know explore our data to spot any immediate relationships. As we will be performing regression analysis it is helpful to get an immediate insight into the relationship between our variables as very high correlations between individual regressors can bias the regression coefficients although, this is purely for insight and not a filter on which variables to include as we won’t be running our regressions on the whole data set.
```{r}
correlation_analysis_countries = countries_cleaned %>% select(-year) %>% select(-country)
correlations = cor(correlation_analysis_countries)
ggcorrplot(cor(correlation_analysis_countries))
```
Note that for the purposes of running a regression analysis, only the correlations between independent variables (regressors) are important for assessing multicollinearity.
We can also observe the relationship between the economic growth determinants and GDP per capita in general across all countries to gain an idea of what to expect.
```{r}
plot(log(countries_cleaned$GDP_per_capita),log(countries_cleaned$investment))
plot(log(countries_cleaned$GDP_per_capita),log(countries_cleaned$HCI_score))
plot(log(countries_cleaned$GDP_per_capita),log(countries_cleaned$IDI_score))
plot(log(countries_cleaned$GDP_per_capita),log(countries_cleaned$EFI_score))
```
This paper analyses the determinants of economic growth by level of economic development as growth factors are likely to vary across levels of development due to having structurally different economies. We classify countries in our dataset into 4 groups from 1 (most developed) to 4 (least developed) based on their average ranking in the Economic Complexity Index (ECI) over the 13 year dataset period.
```{r}
G1_countries = countries_cleaned %>% filter(country %in% c('Austria', 'Belgium', 'Switzerland', 'Czech Republic', 'Germany', 'Denmark', 'Finland', 'France', 'United Kingdom', 'Hungary', 'Ireland', 'Italy', 'Japan', 'Mexico', 'Singapore', 'Slovenia', 'Sweden', 'United States', 'Luxembourg'))
G2_countries = countries_cleaned %>% filter(country %in% c('Bulgaria', 'Canada', 'Cyprus', 'Spain', 'Estonia', 'Croatia', 'Israel', 'Lithuania', 'Latvia', 'Malaysia', 'Netherlands', 'Norway', 'Panama', 'Poland', 'Portugal', 'Romania', 'Thailand', 'Turkey', 'Iceland', 'Malta'))
G3_countries = countries_cleaned %>% filter(country %in% c('Albania', 'United Arab Emirates', 'Argentina', 'Australia', 'Bahrain', 'Brazil', 'Chile', 'Colombia', 'Costa Rica', 'Georgia', 'Greece', 'Indonesia', 'Jordan', 'Moldova', 'Mauritius', 'Namibia', 'New Zealand', 'Saudi Arabia', 'Tunisia', 'Ukraine', 'Uruguay', 'South Africa'))
G4_countries = countries_cleaned %>% filter(country %in% c('Azerbaijan', 'Burkina Faso', 'Botswana', 'Cameroon', 'Algeria', 'Ecuador', 'Kazakhstan', 'Morocco', 'Madagascar', 'Oman', 'Peru', 'Paraguay', 'Senegal', 'Uganda', 'Zimbabwe'))
```
The following table shows the countries in each group.
```{r}
country_groups_table <- data.frame(Country = c("1", "2", "3", "4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22"), Group1_Countries = c('Austria', 'Belgium', 'Switzerland', 'Czech Republic', 'Germany', 'Denmark', 'Finland', 'France', 'United Kingdom', 'Hungary', 'Ireland', 'Italy', 'Japan', 'Mexico', 'Singapore', 'Slovenia', 'Sweden', 'United States', 'Luxembourg', '', '', ''), Group2_Countries = c('Bulgaria', 'Canada', 'Cyprus', 'Spain', 'Estonia', 'Croatia', 'Israel', 'Lithuania', 'Latvia', 'Malaysia', 'Netherlands', 'Norway', 'Panama', 'Poland', 'Portugal', 'Romania', 'Thailand', 'Turkey', 'Iceland', 'Malta', '', ''), Group3_Countries = c('Albania', 'United Arab Emirates', 'Argentina', 'Australia', 'Bahrain', 'Brazil', 'Chile', 'Colombia', 'Costa Rica', 'Georgia', 'Greece', 'Indonesia', 'Jordan', 'Moldova', 'Mauritius', 'Namibia', 'New Zealand', 'Saudi Arabia', 'Tunisia', 'Ukraine', 'Uruguay', 'South Africa'), Group4_Countries = c('Azerbaijan', 'Burkina Faso', 'Botswana', 'Cameroon', 'Algeria', 'Ecuador', 'Kazakhstan', 'Morocco', 'Madagascar', 'Oman', 'Peru', 'Paraguay', 'Senegal', 'Uganda', 'Zimbabwe', '', '', '', '', '', '', ''))
```
```{r, results='asis'}
kable(country_groups_table, caption = "List of countries in each group") %>% kable_styling()
```
The number of observations for each group:
```{r}
nrow(G1_countries)
nrow(G2_countries)
nrow(G3_countries)
nrow(G4_countries)
```
As we can see, we have the least data for the least developed countries which is to be expected.
# Methodology
## Multiple Linear Regression Model
Multiple linear regression analysis is used for each group of countries to analyze the relative importance of economic growth determinants. For each regression the natural logarithm of all variables is taken as the figures for different factors vary greatly in magnitude and so this can help reduce the impact of heteroskedasticity. By having panel data, fixed effects regression analysis can be used to control for heterogenous time-invariant attributes across countries. The Hausman test (Hausman, 1978) is used to decide whether to use fixed or random effects where the null hypothesis is that the preferred model is random effects and fixed effects being the preferred model under the alternative hypothesis. This test essentially analyses whether the unique errors are correlated with the regressors, which the null hypothesis argues are not. For each regression the p-value is less than 0.05 and so at the 5% significance level the null hypothesis is rejected and a fixed effects model is adopted. Tests to include time effects are similarly conducted with the conclusion being that time effects should be included in all models at the 5% significance level.
```{r}
fixedG1 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G1_countries, index=c("country", "year"), model="within")
randomG1 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G1_countries, index=c("country", "year"), model="random")
phtest(fixedG1, randomG1)
```
Therefore, as the p-value < 0.05, we can reject the null hypothesis at the 5% significance level and conclude that we should use a fixed effects model.
```{r}
fixedG2 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G2_countries, index=c("country", "year"), model="within")
randomG2 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G2_countries, index=c("country", "year"), model="random")
phtest(fixedG2, randomG2)
```
Therefore, as the p-value < 0.05, we can reject the null hypothesis at the 5% significance level and conclude that we should use a fixed effects model.
```{r}
fixedG3 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G3_countries, index=c("country", "year"), model="within")
randomG3 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G3_countries, index=c("country", "year"), model="random")
phtest(fixedG3, randomG3)
```
Therefore, as the p-value < 0.05, we can reject the null hypothesis at the 5% significance level and conclude that we should use a fixed effects model.
```{r}
```
```{r}
fixedG4 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G4_countries, index=c("country", "year"), model="within")
randomG4 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G4_countries, index=c("country", "year"), model="random")
phtest(fixedG4, randomG4)
```
Therefore, as the p-value < 0.05, we can reject the null hypothesis at the 5% significance level and conclude that we should use a fixed effects model.
Now we similarly test as to whether we should include time effects in the model:
```{r}
fixed.timeG1 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score) + factor(year), data=G1_countries, index=c("country","year"), model="within")
pFtest(fixed.timeG1, fixedG1)
```
Therefore, as the p-value < 0.05, we can reject the null hypothesis at the 5% significance level and conclude that we should include time effects.
```{r}
fixed.timeG2 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score) + factor(year), data=G2_countries, index=c("country","year"), model="within")
pFtest(fixed.timeG2, fixedG2)
```
Therefore, as the p-value < 0.05, we can reject the null hypothesis at the 5% significance level and conclude that we should include time effects.
```{r}
fixed.timeG3 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score) + factor(year), data=G3_countries, index=c("country","year"), model="within")
pFtest(fixed.timeG3, fixedG3)
```
Therefore, as the p-value < 0.05, we can reject the null hypothesis at the 5% significance level and conclude that we should include time effects.
```{r}
fixed.timeG4 <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score) + factor(year), data=G4_countries, index=c("country","year"), model="within")
pFtest(fixed.timeG4, fixedG4)
```
Therefore, as the p-value < 0.05, we can reject the null hypothesis at the 5% significance level and conclude that we should include time effects.
Therefore, each regression model includes both time and country fixed effects. There are no further control variables but by using fixed effects multiple linear regression analysis, the regressor coefficients will automatically show the associated impact of each economic growth determinant on output while controlling for all the other growth factors and exogenous time-invariant heterogeneities across countries as well as variables which are constant across countries but vary over time. Thus, formally, the entity and time fixed effects regression model for each group of countries is as follows:
$$ln(y_{it})= β_0+β_1ln(k_{it})+β_2ln(h_{it})+β_3ln(z_{p_{it}})+β_4ln(z_{s_{it}})+γ_2D2_i+...+γ_nDn_i+δ_2 B2_t+...+δ_T BT_t+e_{it}$$
where:
$γ_n Dn_i$ = Dummy variable for nth country in dataset. These country dummy variables control for unobserved time-invariant heterogeneities across countries.
$δ_t BT_t$ = Dummy variable for year t in dataset. These time dummy variables control for variables that are constant across entities but vary over time.
$e_{it}$ = Residual error term between the true observed value and the model’s fitted value.
There are only n−1 country and t-1 time dummies (i.e. $γ_1 D1_i$ and $δ_1 B1_t$ are omitted) as the regression model already includes an intercept $β_0$.
The regression model can be simplified in notation as follows:
$$ln(y_{it})=β_1ln(k_{it})+β_2ln(h_{it})+β_3ln(z_{p_{it}})+β_4ln(z_{s_{it}})+α_i+δ_t+e_{it}$$
where:
$α_i$ = Entity fixed effect i.e. country-specific intercepts that capture time-invariant heterogeneities across countries e.g. country climates, cultures.
$δ_t$ = Time fixed effect i.e. time-specific intercepts that capture differences in log GDP per capita that vary across time periods but not across individual countries e.g. global macroeconomic conditions like the impact of the Covid-19 pandemic.
To estimate the regression coefficients: $(β_0,β_1,β_2,β_3,β_4 )$ we use the ordinary least squares (OLS) method with fixed effects. The OLS method minimizes the sum of the squares of the residuals (differences between observed dependent variables and the values predicted by the function of independent variables and fixed effects) which mathematically is as follows:
$$min(∑_ie_{it}^2 )=min(∑_i(ln(y_{it})-\hat{ln(y_{it})} )^2 )$$
$$\sum_{i} [ln(y_{it})-(β_0+β_1ln(k_{it})+β_2ln(h_{it})+β_3ln(z_{p_{it}})+β_4ln(z_{s_{it}})+γ_2D2_i+...+γ_nDn_i+δ_2 B2_t+...+δ_T BT_t)]^2$$
Or
$$\sum_{i} [ln(y_{it})-(β_1ln(k_{it})+β_2ln(h_{it})+β_3ln(z_{p_{it}})+β_4ln(z_{s_{it}})+α_i+δ_t)]^2$$
Provided the fixed effects regression assumptions hold, the sampling distribution of the OLS estimator in the fixed effects regression model is normal in large samples. Thus, the variance of the estimates can be estimated and standard errors, t-statistics and confidence intervals computed for the coefficients.
## Testing Regression Assumptions
The following regression assumptions must hold for the best inference from the fixed effects regression models:
1. No multicollinearity
2. Linearity
3. Normally distributed errors
4. Independent error terms (no autocorrelation)
5. Homoskedasticity (constant error variance)
6. No exogeneity
The following measures are used to test if the regression models satisfy each assumption.
### Assumption 1: Multicollinearity
While multicollinearity does not reduce the predictive power of a model it alters the coefficients of individual regressors of which we are interested in knowing for measuring the importance of each of the economic growth determinants as countries develop. The Pearson correlation coefficient between regressors is used to test for multicollinearity with a maximum permitted correlation coefficient of 0.75, otherwise at least one of the regressors is dropped from the model.
### Assumption 2: Linearity
Linearity means the dependent variable is a linear combination of the regression coefficients and predictor variables. This ensures we are using the correct functional form to model the relationship between the dependent variable (GDP per capita) and the predictor variables (the economic growth determinants). A plot of the residuals versus predicted GDP per capita value from the regression model is used to test for linearity. If the residuals are plotted fairly evenly around the zero line, then the model exhibits an acceptable degree of linearity.
### Assumption 3: Normality of error terms
Non-normality of error terms will impact the standard error of regression coefficients which impacts whether a growth determinant is statistically significant. To assess the normality of error terms a histogram and Q-Q plot of the residuals is used. If there is normality, the histogram should display a normal distribution. A Q-Q plot is a scatterplot created by plotting two sets of quantiles against one another. If both sets of quantiles come from the same distribution i.e. the error terms are normally distributed, the points should form a line that’s roughly straight.
### Assumption 4: Independent Error Terms (No Autocorrelation)
Autocorrelation is a degree of similarity between a given time series and a lagged version of itself over successive time intervals. If a regression model exhibits autocorrelation, this impacts the standard errors of the coefficients thereby impacting whether a regressor term is statistically significant. The Breusch-Godfrey test (Breusch, 1978) is used to test for autocorrelated errors in the regression model. The Breusch-Godfrey test has the following null and alternative hypotheses:
$H_0$: no autocorrelation exists
$H_1$: autocorrelation exists
If the p-value < 0.05, then at the 5% significance level the null hypothesis is rejected, and we conclude that autocorrelation exists.
### Assumption 5: Constant Error Variance (Homoskedasticity)
Heteroskedasticity is when the standard deviations of a predicted variable, as calculated over different values of an independent variable(s) or as related to prior time periods is not constant. With homoskedasticity, the Gaussian Markov theorem ensures that each least-squares estimator is the best linear unbiased estimator. To have homoskedasticity, all the variances of the error terms must be constant and not depend on the covariates, which means that each probability distribution of the response variable has the same variance regardless of the covariates. Mathematically, this is expressed as follows:
$$E(e│x)=0$$
$$E(e^2│X)= σ^2$$
$$∴Var(e│X)=E(e^2│X)-E(e│X)^2=σ^2$$
If heteroskedasticity is present, this impacts the standard errors of regressor coefficients and thus whether a regressor variable is statistically significant. The Breusch-Pagan test is used to test for heteroskedasticity (Breusch; Pagan, 1979) which has the following null and alternative hypotheses:
$H_0$: the error variances are all equal
$H_1$: the error variances are not equal
If the p-value < 0.05, then at the 5% significance level the null hypothesis is rejected, and we conclude that heteroskedasticity exists.
Most of the fixed effects regression models in this paper exhibit both heteroskedasticity and autocorrelation. This means that while the regressor coefficients are still unbiased the standard errors are wrong (usually understating the true uncertainty). To correct for this, clustered standard errors are used which are a form of heteroskedasticity and autocorrelation-consistent standard errors. Clustered standard errors allow the regression error terms to have an arbitrary correlation within a grouping but assume that the regression errors are uncorrelated across groups. Clustered standard errors are still valid whether or not there is heteroskedasticity, autocorrelation or both. Clustered standard errors are generated in R using the vcovHC function from the sandwich library.
### Assumption 6: No Exogeneity
Despite using the most common important economic growth determinants as control variables for one another and including country and time fixed effects, the regression models cannot control for omitted variables that vary both across countries and time. However, this paper assumes that the overall impact of such exogenous omitted variables is small.
# Regression Analysis
We now test whether the regression models satisfy the necessary assumptions.
## Most Developed (Group 1) Countries
The initial fixed effects regression model for group 1 countries is given by the following code:
```{r}
reg_G1_fixed <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G1_countries, index=c("country", "year"), model="within", effects="twoways")
```
### Assumption 1: Multicollinearity
The Pearson correlation coefficient between regressors is used to test for multicollinearity and keep only those growth determinants with a correlation coefficient less than 0.75.
```{r}
corr_analysis_countriesG1 = countries_cleaned %>% filter(country %in% c('Austria', 'Belgium', 'Switzerland', 'Czech Republic', 'Germany', 'Denmark', 'Finland', 'France', 'United Kingdom', 'Hungary', 'Ireland', 'Italy', 'Japan', 'Mexico', 'Singapore', 'Slovenia', 'Sweden', 'United States', 'Luxembourg')) %>% mutate(GDP_per_capita = log(GDP_per_capita)) %>% mutate(investment = log(investment)) %>% mutate(EFI_score = log(EFI_score)) %>% mutate(IDI_score = log(IDI_score)) %>% mutate(HCI_score = log(HCI_score)) %>% select(-year) %>% select(-country)
cor(corr_analysis_countriesG1, method="pearson")
```
As all the independent variables have correlations less than 0.75 we do not need to remove any growth determinants.
## Assumption 2: Linearity
We plot the residual vs fitted values to test for linearity in the regression model.
```{r}
resid_plot_G1_fixed = ggplot(G1_countries, aes(x= as.matrix(log(G1_countries$GDP_per_capita) - residuals(reg_G1_fixed), idbyrow = TRUE), y= as.matrix(residuals(reg_G1_fixed), idbyrow = TRUE))) + geom_point() + geom_hline(yintercept=0) + labs(x="Fitted Variable (predicted log GDP per capita from Group 1 countries fixed effects regression model)", y="Residual")
resid_plot_G1_fixed
```
As the residual vs fitted plot shows, there is a fairly even distribution around the 0-line indicating that our model has good functional form.
## Assumption 3: Normality of error terms
Plot of histogram of the error terms:
```{r}
hist(residuals(reg_G1_fixed))
```
Q-Q plot of error terms:
```{r}
qqnorm(residuals(reg_G1_fixed), ylab = 'Residuals')
```
The error terms of the regression model display a roughly normal distribution as shown by the histogram and Q-Q plot.
## Assumption 4: Independent Error Terms (No Autocorrelation) Test
The Breusch-Godfrey test returns:
```{r}
pbgtest(reg_G1_fixed)
```
Therefore, at the 5% significance level there is sufficient evidence to reject the null hypothesis and conclude that the regression model exhibits autocorrelation which will thus be corrected for using robust standard errors.
## Assumption 5: Constant Error Variance (Homoskedasticity) Test
The Breusch-Pagan test returns:
```{r}
bptest(reg_G1_fixed)
```
Therefore, at the 5% significance level there is insufficient evidence to reject the null hypothesis and thus we can conclude that the regression model does not have heteroskedasticity.
# Developed (Group 2) Countries
Our initial fixed effects regression model for group 2 countries is given by the following code:
```{r}
reg_G2_fixed <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G2_countries, index=c("country", "year"), model="within", effects="twoways")
```
## Assumption 1: Multicollinearity
The Pearson correlation coefficient between regressors is used to test for multicollinearity and keep only those growth determinants with a correlation coefficient less than 0.75.
```{r}
corr_analysis_countriesG2 = countries_cleaned %>% filter(country %in% c('Bulgaria', 'Canada', 'Cyprus', 'Spain', 'Estonia', 'Croatia', 'Israel', 'Lithuania', 'Latvia', 'Malaysia', 'Netherlands', 'Norway', 'Panama', 'Poland', 'Portugal', 'Romania', 'Thailand', 'Turkey', 'Iceland', 'Malta')) %>% mutate(GDP_per_capita = log(GDP_per_capita)) %>% mutate(investment = log(investment)) %>% mutate(EFI_score = log(EFI_score)) %>% mutate(IDI_score = log(IDI_score)) %>% mutate(HCI_score = log(HCI_score)) %>% select(-year) %>% select(-country)
cor(corr_analysis_countriesG2, method="pearson")
```
HCI is removed due to multicollinearity.
```{r}
reg_G2_fixed <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score), data=G2_countries, index=c("country", "year"), model="within", effects="twoways")
```
## Assumption 2: Linearity
We plot the residual vs fitted values to test for linearity in the regression model.
```{r}
resid_plot_G2_fixed = ggplot(G2_countries, aes(x= as.matrix(log(G2_countries$GDP_per_capita) - residuals(reg_G2_fixed), idbyrow = TRUE), y= as.matrix(residuals(reg_G2_fixed), idbyrow = TRUE))) + geom_point() + geom_hline(yintercept=0) + labs(x="Fitted Variable (predicted log GDP per capita from Group 2 countries fixed effects regression model)", y="Residual")
resid_plot_G2_fixed
```
As the residual vs fitted plot shows, there is a fairly even distribution around the 0-line indicating that our model has good functional form.
## Assumption 3: Normality of error terms
Plot of histogram of the error terms:
```{r}
hist(residuals(reg_G2_fixed))
```
Q-Q plot of error terms:
```{r}
qqnorm(residuals(reg_G2_fixed), ylab = 'Residuals')
```
The error terms of the regression model display a roughly normal distribution as shown by the histogram and Q-Q plot.
## Assumption 4: Independent Error Terms (No Autocorrelation) Test
The Breusch-Godfrey test returns:
```{r}
pbgtest(reg_G2_fixed)
```
Therefore, at the 5% significance level there is sufficient evidence to reject the null hypothesis and conclude that the regression model exhibits autocorrelation which will thus be corrected for using robust standard errors.
## Assumption 5: Constant Error Variance (Homoskedasticity) Test
The Breusch-Pagan test returns:
```{r}
bptest(reg_G2_fixed)
```
Therefore, at the 5% significance level there is sufficient evidence to reject the null hypothesis and conclude that the regression model exhibits heteroskedasticity which will thus be corrected for using robust standard errors.
# Less Developed (Group 3) Countries
Our initial fixed effects regression model for group 2 countries is given by the following code:
```{r}
reg_G3_fixed <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G3_countries, index=c("country", "year"), model="within", effects="twoways")
```
## Assumption 1: Multicollinearity
The Pearson correlation coefficient between regressors is used to test for multicollinearity and keep only those growth determinants with a correlation coefficient less than 0.75.
```{r}
corr_analysis_countriesG3 = countries_cleaned %>% filter(country %in% c('Albania', 'United Arab Emirates', 'Argentina', 'Australia', 'Bahrain', 'Brazil', 'Chile', 'Colombia', 'Costa Rica', 'Georgia', 'Greece', 'Indonesia', 'Jordan', 'Moldova', 'Mauritius', 'Namibia', 'New Zealand', 'Saudi Arabia', 'Tunisia', 'Ukraine', 'Uruguay', 'South Africa')) %>% mutate(GDP_per_capita = log(GDP_per_capita)) %>% mutate(investment = log(investment)) %>% mutate(EFI_score = log(EFI_score)) %>% mutate(IDI_score = log(IDI_score)) %>% mutate(HCI_score = log(HCI_score)) %>% select(-year) %>% select(-country)
cor(corr_analysis_countriesG3, method="pearson")
```
As all the independent variables have correlations less than 0.75 no variables are removed.
## Assumption 2: Linearity
We plot the residual vs fitted values to test for linearity in the regression model.
```{r}
resid_plot_G3_fixed = ggplot(G3_countries, aes(x= as.matrix(log(G3_countries$GDP_per_capita) - residuals(reg_G3_fixed), idbyrow = TRUE), y= as.matrix(residuals(reg_G3_fixed), idbyrow = TRUE))) + geom_point() + geom_hline(yintercept=0) + labs(x="Fitted Variable (predicted log GDP per capita from Group 3 countries fixed effects regression model)", y="Residual")
resid_plot_G3_fixed
```
As the residual vs fitted plot shows, there is a fairly even distribution around the 0-line indicating that our model has good functional form.
## Assumption 3: Normality of error terms
Plot of histogram of the error terms:
```{r}
hist(residuals(reg_G3_fixed))
```
Q-Q plot of error terms:
```{r}
qqnorm(residuals(reg_G3_fixed), ylab = 'Residuals')
```
The error terms of the regression model display a roughly normal distribution as shown by the histogram and Q-Q plot.
## Assumption 4: Independent Error Terms (No Autocorrelation) Test
The Breusch-Godfrey test returns:
```{r}
pbgtest(reg_G3_fixed)
```
Therefore, at the 5% significance level there is sufficient evidence to reject the null hypothesis and conclude that the regression model exhibits autocorrelation which will thus be corrected for using robust standard errors.
## Assumption 5: Constant Error Variance (Homoskedasticity) Test
The Breusch-Pagan test returns:
```{r}
bptest(reg_G3_fixed)
```
Therefore, at the 5% significance level there is sufficient evidence to reject the null hypothesis and conclude that the regression model exhibits heteroskedasticity which will thus be corrected for using robust standard errors.
# Least Developed (Group 4) Countries
Our initial fixed effects regression model for group 2 countries is given by the following code:
```{r}
reg_G4_fixed <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(IDI_score) + log(HCI_score), data=G4_countries, index=c("country", "year"), model="within", effects="twoways")
```
## Assumption 1: Multicollinearity
The Pearson correlation coefficient between regressors is used to test for multicollinearity and keep only those growth determinants with a correlation coefficient less than 0.75.
```{r}
corr_analysis_countriesG4 = countries_cleaned %>% filter(country %in% c('Azerbaijan', 'Burkina Faso', 'Botswana', 'Cameroon', 'Algeria', 'Ecuador', 'Kazakhstan', 'Morocco', 'Madagascar', 'Oman', 'Peru', 'Paraguay', 'Senegal', 'Uganda', 'Zimbabwe', 'Burkina Faso', 'Cameroon')) %>% mutate(GDP_per_capita = log(GDP_per_capita)) %>% mutate(investment = log(investment)) %>% mutate(EFI_score = log(EFI_score)) %>% mutate(IDI_score = log(IDI_score)) %>% mutate(HCI_score = log(HCI_score)) %>% select(-year) %>% select(-country)
cor(corr_analysis_countriesG4, method="pearson")
```
As IDI has a correlation coefficient over 0.75 with both physical capital per capita and HCI, IDI is removed from the regression model.
Thus, the new fixed effects regression model is given by:
```{r}
reg_G4_fixed <- plm(log(GDP_per_capita) ~ log(investment) + log(EFI_score) + log(HCI_score), data=G4_countries, index=c("country", "year"), model="within", effects="twoways")
```
## Assumption 2: Linearity
We plot the residual vs fitted values to test for linearity in the regression model.
```{r}
resid_plot_G4_fixed = ggplot(G4_countries, aes(x= as.matrix(log(G4_countries$GDP_per_capita) - residuals(reg_G4_fixed), idbyrow = TRUE), y= as.matrix(residuals(reg_G4_fixed), idbyrow = TRUE))) + geom_point() + geom_hline(yintercept=0) + labs(x="Fitted Variable (predicted log GDP per capita from Group 4 countries fixed effects regression model)", y="Residual")
resid_plot_G4_fixed
```
As the residual vs fitted plot shows, there is a fairly even distribution around the 0 line indicating that our model has good functional form.
## Assumption 3: Normality of error terms
Plot of histogram of the error terms:
```{r}
hist(residuals(reg_G4_fixed))
```
Q-Q plot of error terms:
```{r}
qqnorm(residuals(reg_G4_fixed), ylab = 'Residuals')
```
The error terms of the regression model display a roughly normal distribution as shown by the histogram and Q-Q plot.
## Assumption 4: Independent Error Terms (No Autocorrelation) Test
The Breusch-Godfrey test returns:
```{r}
pbgtest(reg_G4_fixed)
```
Therefore, at the 5% significance level there is sufficient evidence to reject the null hypothesis and conclude that the regression model exhibits autocorrelation which will thus be corrected for using robust standard errors.
## Assumption 5: Constant Error Variance (Homoskedasticity) Test
The Breusch-Pagan test returns:
```{r}
bptest(reg_G4_fixed)
```
Therefore, at the 5% significance level there is sufficient evidence to reject the null hypothesis and conclude that the regression model exhibits heteroskedasticity which will thus be corrected for using robust standard errors.
# Results
As each of the regression models have now been altered to satisfy all the necessary regression assumptions, the coefficients and standard errors are reliable. Note that the p-values for each independent variable tests the null hypothesis that the regressor has no correlation with the dependent variable. If there is no correlation, then there is no association between the changes in the independent variable and shifts in the dependent variable i.e. insufficient evidence to conclude that there is an effect at the population level. This paper uses a 5% significance level to test if an independent variable is statistically significant.
## Most Developed (Group 1) Countries
The full summary table for all the regressor coefficients and supplementary statistics:
```{r}
summary(reg_G1_fixed)
```
Adjusting for autocorrelation by using clustered standard errors we have the final coefficient table:
```{r}
coeftest(reg_G1_fixed, vcov = vcovHC, type = "HC1")
```
We thus have the following regression model:
$$ln(y_{it})=0.478ln(k_{it})+0.356ln(h_{it})+0.111ln(z_{p_{it}})+0.345ln(z_{s_{it}})+α_i+δ_t+e_{it}$$
However, only capital per worker and physical productivity are statistically significant.
Therefore, in the most developed countries:
A 5% increase in investment is associated, on average, with a 2.36% increase (2dp) in GDP per capita holding all other variables constant.
A 5% increase in physical productivity is associated, on average, with a 0.54% increase (2dp) in GDP per capita holding all other variables constant.
The adjusted $R^2$ of the regression model is: 0.67005.
## Developed (Group 2) Countries
The full summary table for all the regressor coefficients and supplementary statistics:
```{r}
summary(reg_G2_fixed)
```
Adjusting for heteroskedasticity and autocorrelation by using clustered standard errors we have the final coefficient table:
```{r}
coeftest(reg_G2_fixed, vcov = vcovHC, type = "HC1")
```
We thus have the following regression model:
$$ln(y_{it})=0.485ln(k_{it})+0.369ln(z_{p_{it}})+0.232ln(z_{s_{it}})+α_i+δ_t+e_{it}$$
However, only capital per worker and physical productivity are statistically significant.
Therefore, in group 2 developed countries:
A 5% increase in investment is associated, on average, with a 2.39% increase (2dp) in GDP per capita holding all other variables constant.
A 5% increase in physical productivity is associated, on average, with a 1.91% increase (2dp) in GDP per capita holding all other variables constant.
The adjusted $R^2$ of the regression model is: 0.79469.
## Less Developed (Group 3) Countries
The full summary table for all the regressor coefficients and supplementary statistics:
```{r}
summary(reg_G3_fixed)
```
Adjusting for heteroskedasticity and autocorrelation by using clustered standard errors we have the final coefficient table:
```{r}
coeftest(reg_G3_fixed, vcov = vcovHC, type = "HC1")
```
We thus have the following regression model:
$$ln(y_{it})=0.560ln(k_{it})-0.181ln(h_{it})+0.410ln(z_{p_{it}})+0.006ln(z_{s_{it}})+α_i+δ_t+e_{it}$$
However, only capital per worker and physical productivity are statistically significant.
Therefore, in less developed countries:
A 5% increase in investment is associated, on average, with a 2.77% increase (2dp) in GDP per capita holding all other variables constant.
A 5% increase in physical productivity is associated, on average, with a 2.02% increase (2dp) in GDP per capita holding all other variables constant.
The adjusted $R^2$ of the regression model is: 0.79786.
## Least Developed (Group 4) Countries
The full summary table for all the regressor coefficients and supplementary statistics:
```{r}
summary(reg_G4_fixed)
```
Adjusting for heteroskedasticity and autocorrelation by using clustered standard errors we have the final coefficient table:
```{r}
coeftest(reg_G4_fixed, vcov = vcovHC, type = "HC1")
```
We thus have the following regression model (as physical productivity was dropped from the regression model due to multicollinearity):
$$ln(y_{it})=0.581ln(k_{it})+0.283ln(h_{it})+0.472ln(z_{s_{it}})+α_i+δ_t+e_{it}$$
However, only capital per worker and physical productivity are statistically significant.
Therefore, in the least developed countries:
A 5% increase in investment is associated, on average, with a 2.88% increase (2dp) in GDP per capita holding all other variables constant.
A 5% increase in social productivity is associated, on average, with a 2.33% increase (2dp) in GDP per capita holding all other variables constant.
The adjusted $R^2$ of the regression model is: 0.76706.
## Results Summary
The associated impact on percentage growth in GDP per capita for a 5% increase in each economic growth determinant while holding all other determinants constant and accounting for country and time fixed effects is shown below for all 4 country groups.
```{r results='asis'}
summary_table <- data.frame(Economic_Growth_Determinant = c("Physical Capital per Capita", "Social Productivity ", "Physical Productivity", "Human Capital per Capita"), Group4_Countries = c("2.88%", "2.33%","Variable Removed"," Statistically Insignificant "), Group3_Countries = c("2.77%"," Statistically Insignificant ","2.02%", " Statistically Insignificant "), Group2_Countries = c("2.39%"," Statistically Insignificant ","1.82%","Variable Removed"), Group1_Countries = c("2.36%"," Statistically Insignificant ","0.54%"," Statistically Insignificant "))
kable(summary_table, caption = 'Results Summary') %>% kable_styling()
```
There are several interesting results from this study. The key points are:
• All economic growth determinants exhibit individual decreasing returns to scale i.e. a 5% increase in any growth factor will result in a less than 5% associated increase in output ceteris paribus. This supports the view that economic growth determinants are complementary in causing economic growth as is assumed by the Cobb-Douglas production function used in neoclassical growth models.
• There is evidence to support conditional convergence. Less developed countries have higher output growth rates all else equal when increasing any statistically significant economic growth factor and thus in the long run they should grow faster than more developed countries.
• There is positive but diminishing marginal returns to physical capital increases as countries develop as originally posited by the Solow model.
• Contrary to what is assumed by neoclassical production functions, this paper finds evidence to suggest that there is positive but diminishing marginal returns to physical productivity increases as countries develop (at least in the short-term) rather than it being an exogenous constant.
• Surprisingly, human capital seems to be a statistically insignificant growth determinant across most countries.
### Role of Physical Capital
There is evidence to support the positive but diminishing marginal returns to physical capital as countries develop which supports the economic growth models of (labour-augmented) Solow-Swan. Physical capital is also the only economic growth determinant that is statistically significant across all levels of development.
### Role of Human Capital
Human capital seems to be a statistically insignificant growth determinant across most countries. However, this surprising result is likely due to 2 factors. The first is the lack of available data, there was the least data available for the HCI than any other growth factor. The HCI is a fairly new index which has significant amounts of missing data which were interpolated via linear regression. This interpolation is likely to reduce the significance of human capital in the regression analysis, particularly in less developed economies whose GDP per capita rates fluctuate a lot more from year-to-year than developed economies (whose rates of growth are instead more linear). Nevertheless, the HCI is still the best available dataset for the time period assessed in this paper as it has a more holistic view of human capital and more frequent data available compared to other publicly available human capital proxies. Secondly, the 13-year time period in this paper is too short to fully observe the returns on output from improvements in human capital.
### Role of Physical Productivity
There is positive but diminishing marginal returns to physical productivity increases as countries develop. This is an important observation as most economic growth models rely on exogenous technology improvements to drive growth and such technology improvements have resulted in large growth during periods of great technological change like the industrial revolution. However, for how long can economies rely on technology improvements, are technology levels bounded? Technology levels still have lots of unfulfilled potential and society is no way near any upper bound on technology levels if one even exists. However, over the short-term, the gains from technological advancement appear to be diminishing with development. This makes some intuitive sense as for example, the gain from everyone having access to a laptop when they previously had no laptop is likely higher than the gain from everyone having access to a faster laptop when they already had a laptop. However, over the longer term (likely decades) there are periods when major technological breakthroughs are made whereby the returns on investment in technology are likely much higher such as at the start of the industrial or digital revolution. Essentially, when you are exploring (but likely not founding due to potentially high start-up costs) a new technology there is the potential for higher returns on investment into technology, however when you are simply improving pre-existing technologies then the returns will likely be lower. What likely happened over the sample period of 2007-2019 was that there were no major breakthroughs which had become commercially available for use and developed countries simply improved pre-existing technologies whilst developing countries began to acquire and catch up with the technology levels of more developed countries by imitation and importing products which were both easy to do during this period of hyper globalization, connectivity and easy access to information. This meant the returns on physical productivity improvements were higher in less developed countries and thus diminishing with development. Although, further research is required is to test this hypothesis.
### Role of Social Productivity
Social productivity is important in the least developed nations, however, then becomes statistically insignificant in more developed countries. This may be because the role of institutions is more important in the least developed economies who often have significant issues with corruption, lack of property rights, wars and civil unrest. Once the basic building blocks of good institutions have been established such as those which allow for ease of trade, effective governance, and secure property rights amongst many others, then any further improvements become statistically insignificant. Therefore, social productivity is very important in the poorest nations however, once an acceptable threshold is met any further improvements have a statistically insignificant associated impact on economic growth. Intuitively this makes sense as there is likely an upper bound on social productivity or rapidly diminishing returns when past a certain threshold. For example, how far can you improve how well a society is run or how conducive cultures are towards improving efficiency? There is likely some upper threshold past which further advances are negligible.
# Research and Policy Recommendations
## Research Recommendations
As mentioned throughout this paper, there are some areas where further research is required. One is to repeat the regression analysis with a more complete dataset for human capital which will likely only become available with time as more data is recorded. Repeating the dataset over a longer period will also help fully observe the impact of human capital on output. Secondly, further research could be carried out to test the hypotheses outlined in this paper regarding the explanation for positive but diminishing returns of physical productivity in the short run compared to its impact over the long run and if social productivity is only important for growth before reaching a certain threshold.
## Policy Recommendations
When issuing policy advice, one should reiterate that the regression analysis does not explicitly show a causal relationship but rather quantifies the associated impact on output from increasing an economic growth determinant in countries of different levels of development. Nevertheless, the results from this study suggest that least developed countries should focus on institutional and if possible cultural reforms to improve their social productivity. Meanwhile, all other more developed nations should focus on investment and physical productivity with the former always having a relatively larger impact on output which intuitively makes sense as investment can immediately boost output while physical productivity takes longer to observe the impact on output (but is arguably more important over the long run). Also note that the results of this paper do not necessarily mean that human capital is irrelevant, as explained previously, more complete data sets on human capital are needed in order to confirm the statistical insignificance observed in this study’s dataset.
# Conclusion
Overall, this paper sought to analyze how the factors which influence economic growth change over time as economies develop and structurally change using fixed effects multiple linear regression analysis. The results of this paper do lend support to pre-existing economic growth models which assume complementary economic growth factors, some of which have positive but diminishing marginal returns like physical capital in the Solow-Swan model. However, due to the more in-depth study of the role of TFP in this paper, the results do not fully support a model like Solow-Swan because if TFP behaves in the long-run like it is found to behave in the short-run in this study, then physical productivity and with it TFP (if social productivity becomes insignificant) would have positive but diminishing marginal returns as countries develop which could thus potentially be endogenized into the growth model or at least would not be a constant value as assumed in neoclassical production functions like Solow-Swan. In addition, this paper has shown merit in dividing TFP into its two components of physical and social productivity as they have different evolutions and relative importance depending on an economy’s level of development. Ultimately, this study has shown the relative importance of different economic growth determinants as countries develop and indicates to policy makers in those respective countries which growth factors they should be focusing on. Although, as this paper suggests, growth factors are complimentary and so while a country can pay particular attention to its relatively most important growth factor, this should not detract attention from other growth factors as improvements are needed on all fronts to sustain economic growth, particularly if one seeks output to rise by more than the overall increase in growth determinants.
# References
Barossi-Filho, M, Silva, R. and Diniz, E. (2005). “The Empirics of the Solow Growth Model: Long-Term Evidence”. Journal of Applied Economics, Vol. VIII, No. 1 (May 2005), 31-51.
Breusch, T.S. (1978). “Testing for Autocorrelation in Dynamic Linear Models”. Australian Economic Papers. 17: 334-355.
Breusch, T. S.; Pagan, A.R. (1979). “A Simple Test for Heteroscedasticity and Random Coefficient Variation”. Econometrica.