back to the README ↩️
- Folder and file structure
- Task details
2.0. Task 00_documentation
2.1. Task 01_data
2.2. Task 02_simulation
2.3. Task 03_export_tables
2.4. Task 04_repo_update - Generating flowcharts
Each folder starting with exactly two digits is a task folder. A task folder is organized as a small project to separate tasks like generating the dataset, running the simulation, producing the learning poverty briefs. Whenever possible, we strive for a consistent naming of subfolders, with most tasks 0i having the subfolders 0i1_rawdata, 0i2_programs, 0i3_outputs.
All code needed to process and copy input data sets to the local clone, and to generate all datasets from them are shared through this repository. The code can always be found in the task sub folder programs and it is numbered with a prefix that matches the folder number. This allows to immediately identify where each code fits the workflow by looking at its name.
Some files that are nor code nor datasets (PDFs, presentations etc.) are shared directly over the OneDrive folder as such files are not suitable to share over GitHub. This folder is restricted to contributors in the World Bank.
Folders that would start empty - without any files that we wish to track in the repo - will have a placeholder markdown file, just to synchronize the folder structure, for GitHub would ignore a folder if completely empty.
This folder contains only markdown files that document this project, plus accompanying images.
Sub-Folder Name | Usage |
001_technical_note | Information on how the data was calculated and which sources were used |
002_repo_structure | Guide to the folder structure and data flow in the project |
003_contribution_and_replication | Guidelines for contributing to and replicating this repo |
In this task folder we generate a "picture" of learning poverty in 2015 (rawlatest), and all files needed to project learning poverty, which will be used in the simulation task. This task runs exclusively in Stata.
Sub-Folder Name | Usage |
011_rawdata | This folder starts empty, except for the subfolder hosted_in_repo , which contains 13 .csv and .md files. |
012_programs | Programs that compile all data on the recent history and a current picture of learning poverty |
013_outputs | This folder should start empty. It will store the outputs for the data task. |
For each relevant file in 013_outputs, we generate a mardown documentation, accessible through the links below:
- Documentation of population
- Documentation of enrollment
- Documentation of proficiency
- Documentation of rawfull
- Documentation of rawlatest
All the data needed for this project comes from thirteen .csv and .md files in 01_data/011_rawdata/hosted_in_repo/
. Those files are first imported into 01_data/011_rawdata/
as .dta files, then combined into intermediate datasets. Population, enrollment and proficiency datasets are created in 01_data/013_outputs/
, then combined in an exhaustive manner into rawfull, also stored in 01_data/013_outputs/
From rawfull, we construct multiple preference datasets, each being the result of trimming down rawfull through the preferred_list program. A useful analogy is that each preference is a "picture" of the world, with different camera adjustments and angles. Then, we display the global and regional numbers that each preferences represent. Lastly, we choose one preference that we baptize as rawlatest, which should be understood as the chosen "picture" for learning poverty in 2015.
In this task folder we project the proficiency scores in 2030. It contains code in Stata and in R.
Sub-Folder Name | Usage |
021_rawdata | This folder should have no data on learning poverty - nor enrollment, proficiency, population - the do-files in 022_program should only read learning poverty data from 013_outputs . If those outputs need to be modified for any purpose for the 02_simulation task, then that should be done in the do-files in 012_program . This folder only contains inputs for generating the spells, that is to compare assessments over time. |
022_programs | Programs that run all the simulations |
023_outputs | This folder should start empty. It will store the outputs for all simulations. |
In this task, first, all valid spells are created (0220), then, they are aggregated according to various rules into markdown files (0221). Those markdown files are inputs of growth rates to the simulations (0222), which use an ado file to allow for flexibility in the simulation. This part runs exclusively in Stata.
Though not incorporated in the technical paper, there are also files in R to allow for users to play around with the simulation.
In this task folder we manipulate results from the previous tasks into summary tables. We also export the data in this project as indicators to the World Bank API. It runs exclusively in Stata.
Sub-Folder Name | Usage |
031_rawdata | Contains only one csv, with the metadata of WB API indicators produced by this project |
032_programs | Programs that export all tables and graphs |
033_outputs | Starts empty, will receive several tables, plus the series of learning poverty indicators for the WB API |
For reproducibility purposes, we 'froze' the data gathered from multiple APIs and data sources in 01_data/011_rawdata/hosted_in_repo
. In this task, we update those .md and .csv input files, by running the queries to those APIs and updating the sources. It runs exclusively in Stata. Parts of this task may require access to the World Bank network.
Sub-Folder Name | Usage |
041_rawdata | Raw data that does not come from APIs |
042_programs | Programs that update all input data files |
043_outputs | Starts empty, will receive updated files that may be transferred to 011_rawdata |
In this task folder we generate tables and graphs for the Learning Poverty technical paper. This includes some validation of the Learning Poverty measure using other assessments, such as PISA. It runs exclusively in Stata.
Sub-Folder Name | Usage |
051_rawdata | Contains one Excel file that is the structure of all tables and graphs in the technical paper, plus inputs to the validation performed in the task |
052_programs | Programs that export all tables and graphs |
053_outputs | Starts with a ready-to-use copy of the Excel with tables and figures, and one correlation analysis that require access to microdata that is only available to WBG users. More files are added. |
If you execute this task, you will have two Excel files in 053_outputs. One contains ready-to-use tables and figures (LPV_Tables_Figures_PAPER.xlsx) while the other is created on the fly (LPV_Tables_Figures.xlsx) based on the empty template and the results produced in the local clone of this repository. Unless the task 04_repo_update is run or any input file is changed by the user, both files will be identical.
All the diagrams and flowcharts were generated from text in a similar manner as markdown, through mermaid.
To update the charts, you can use the mermaid live editor. Pasting the code in this page in the live editor will render the images displayed in this page.
As of now, the GitHub markdown renderer does not support mermaid, which is why the rendering can only be done by statically saving the .png files in the repo. But this is a feature that has been requested and may one day be added to GitHub.
graph LR
subgraph "011_rawdata: *.csv and *.md to *.dta"
raw_pop["population_1014 <br/> population_by_age <br/> primary_school_age"]
raw_pro["proficiency_from_GLAD <br/> proficiency_from_NLA <br/> proficiency_no_microdata"]
raw_enr["enrollment_edulit_uis <br/> enrollment_tenr_wbopendata <br/> enrollment_validated"]
subgraph "basic files in 013_output"
raw_enr-->|"0123_combine_enrollment_data <br/> 0124_enrollment_extrapolation"|enr[enrollment]
subgraph "final files in 013_output"
rawfull["rawfull <br/> long on proficiency <br/> wide on enrollment <br/> wide on population"]
preferred{"keep 1 obs by cty for proficiency <br/> keep 1 set of vars for enrollment<br/> keep 1 set of vars for population"}
disp(("display global and <br/> regional aggregates"))