This repository contains the code and instructions for reproducing the results of the paper "Machine-learned patterns suggest that diversification drives economic development" by Charles D. Brummitt, Andrés Gómez-Liévano, Ricardo Hausmann, and Matthew H. Bonds.
The data on exports used in this paper is available for download at Dataverse. Use that link to reproduce the results.
To study the most recent version of this dataset and of related ones, go to the downloads page at the Harvard Atlas of Economic Complexity.
The results of this paper were generated mostly with Python 3.5.3 and, in a few minor parts, in R.
Clone this repository to your local machine. For the bulk of the results, create a virtual environment using your favorite tool for Python (e.g., virtualenv
or Anaconda), and then install the requirements in requirements.txt
. For example, with Anaconda:
# Clone the repo:
git clone https://github.com/cbrummitt/machine_learned_patterns_in_economic_development.git
# Create a virtual environment:
conda create --name id_pat_econ_dev python=3.5.3
# Install pip inside that virtual environment
conda install -n id_pat_econ_dev pip
# Activate the virtual environment and install the requirements
source activate id_pat_econ_dev
# Install the requirements
pip install -r requirements.txt
When you're done using this conda environment, run
source deactivate
to deactivate the environment id_pat_econ_dev
.
For a minimal example of using the codebase, open the notebook Minimal_example.ipynb
inside the notebooks
directory.
The Jupyter notebook
Create_figures.ipynb
in the notebooks
folder uses the scripts in the scripts
folder to create most of the figures in the paper and to run the experiments. The fitted GAM model is stored in the folder entitled notebooks/results
.
The notebook
robustness_checks_phi_0.ipynb
conducts the two robustness checks described in the SI, in which the score on the first principal component is substituted with total export value per capita or with diversification (defined as the number of products with revealed comparative advantage greater than one).
The folder
notebook/use mgcv and R to determine how to transform the target
contains a single CSV file called
Rpop__data_target__pca_2__target_is_difference_True.csv
that is created by the Jupyter notebook Create_figures.ipynb
. This CSV file contains the preprocessed data, and it is imported into R for analysis with the package mgcv
.
The R Markdown file
Determine how to transform the target.Rmd
creates the three GAMs on the preprocessed data with and without the target transformed by square root. It uses the package mgcv
to compute quantile-quantile plots and plots of the residuals to determine how far the target is from normally distributed.
Figures SI-7 through SI-10 are made by the R script notebooks/scriptsR/PriSDA_correlations_with_PC0.R
.