NB This package is a work-in-progress and subject to change, the documentation may not reflect the current code
Static and dynamic, population and household, microsimulation models. Take a base population and project it forward using various methodologies.
Current status:
- static population microsimulation: refinement/testing
- dynamic population microsimulation: in progress here
- quasi-dynamic household microsimulation: basic model
- population-househould assignment algorithm: basic implementation
- coupled dynamic household-population microsimulation: drawing board
- microsynthesis: generating a synthetic population from aggregate (categorical) data and (usually) joint distribution data
- microsimulation: evolving a (microsynthesised) population forward in time
- static: in this context, this means generating a synthetic population using estimated aggregate data, and a microsynthesis as a joint distribution
- dynamic: in this context, this means evolving each entity as a dynamic progress, typically probabilistically. E.g. using Monte-Carlo and fertility/mortality/migration rates.
- quasi-dynamic: in this context, a simplistic evolution whereby existing entities persist according to a survival probability and new entities are created randomly to match a static estimate.
This refers to a sequence of microsyntheses, seeded with 2011 census data, with marginals from ONS mid-year-estimates (2001-2013) and ONS sub-national population projections (2014-2039).
- limited categories available: gender, age, ethnicity.
- highest geographical resolution is currently limited to MSOA (census tables with single year of age are not available at a lower resolution)
This refers to a sequence of microsyntheses, seeded with microsynthesised data, with overall counts coming from DCLG household forcasts (1991-2039).
- microsynthesised data is generated by the upstream household_microsynthesis model, at OA resolution in ~10 categoriacl varaibles.
- Projection model is currently crude: simply samples the base population until the desired total population is achieved.
This refers to a stochastic simulation of individual elements (persons or households) in time using a Monte-Carlo approach. Based on provided fertility, mortality and migration data and guided by static microsimulation above.
Clone the repo:
$ git clone https://github.com/nismod/microsimulation
Requires python 3. The following packages are dependencies, and will need to be installed if not already:
- UKCensusAPI: wrapper around the nomisweb API for census data
- ukpopulation: wrapper around national and subnational population projection data
- humanleague: microsynthesis package
$ python3 -m pip install humanleague ukpopulation ukcensusapi
(humanleague is not currently available via conda-forge, so should be installed with pip for now)
$ conda config --add channels conda-forge # if you haven't already
$ conda install ukcensusapi ukpopulation
The ukcensusapi package requires an API key to function correctly, see here for details
From the root directory of the cloned, repo:
./setup.py install
./setup.py test
$ scripts/run_ssm.py --help
usage: run_ssm.py [-h] [-c config-file] LAD [LAD ...]
static sequential (population/household) microsimulation
positional arguments:
LAD ONS code for LAD (multiple LADs can be set).
optional arguments:
-h, --help show this help message and exit
-c config-file, --config config-file
the model configuration file (json). See
config/*_example.json
where config-file is a JSON file containing the model parameters and settings. Examples can be found in the config subdirectory of this package.
{
"resolution": "MSOA11",
"projection": "ppp",
"census_ref_year": 2011,
"horizon_year": 2039,
"mode": "fast",
"cache_dir": "./cache",
"output_dir": "./data"
}
The requires, as input, a microsynthesised population of households for one or more LADs at OA level for a census year. This data can be generated from census (aggregate) data using the household_microsynth package.
$ scripts/run_ssm_h.py --help
usage: run_ssm_h.py [-h] [-c config-file] LAD [LAD ...]
static sequential (population/household) microsimulation
positional arguments:
LAD ONS code for LAD (multiple LADs can be set).
optional arguments:
-h, --help show this help message and exit
-c config-file, --config config-file
the model configuration file (json). See
config/*_example.json`
where config-file is a JSON file containing the model parameters and settings. Examples can be found in the config subdirectory of this package.
{
"resolution": "OA11",
"projection": "ppp",
"census_ref_year": 2011,
"projection_ref_year": 2014,
"horizon_year": 2020,
"upstream_dir": "../household_microsynth/data",
"input_dir": "./persistent_data",
"output_dir": "./data"
}
This algorithm takes LAD-level populations and households at a specific time and assigns people to the households.
$ scripts/run_assignment.py --help
usage: run_assignment.py [-h] [-c config-file] LAD [LAD ...]
static sequential (population/household) microsimulation
positional arguments:
LAD ONS code for LAD (multiple LADs can be set).
optional arguments:
-h, --help show this help message and exit
-c config-file, --config config-file
the model configuration file (json). See
config/*_example.json
with a configuration like:
$ cat config/ass_example.json
{
"person_resolution": "MSOA11",
"household_resolution": "OA11",
"projection": "ppp",
"strict": true,
"year": 2011,
"data_dir": "./data"
}
It requires data from the household microsimulations and the population microsimulations as described above.
The methodology used to is to randomly sample of the synthetic populations from distributions defined by census microdata. Broadly speaking this relates the age, sex, and ethnicity of the HRP to the age, sex, and ethnicity of other household members. It helps to avoid nonsensical or unlikely household combinations such as cohabiting couples with enormous age differences, or children who are only fractionally younger than a parent. The effect is preserve the distribution of household structures seen in the last census. More up-to-date information may be available for surveys (e.g. BHPS) but may lack the breadth of the census microdata.
Of the household structures defined in the census, all contain one household reference person, and some categories are more precise about the number and status of the occupants. For example, single-occupant households must contain a single adult; single-parent households of size 3 must contain one adult and two children. Conversely, multiple occupant households containing 4+ occupants are less well defined.
The approach taken in the algorithm is to get the specific structures assigned first. There is additional leeway provided by the facts that:
- the household data includes empty households (as per census), which can be populated if necessary.
- the population data is at a lower geographical resolution so a given household (in a specific OA) has a larger pool of people to sample from (the containing MSOA).
The notion of assignment in this context means linking rows in two tables: the household table is given an additional column that refers to an entry in the person table, this is the HRP. The people table is given a column containing a household ID. Once assignment is complete, every person will be associated with a household, and every household will be associated with a HRP. Once a household is filled, it is marked as such and no more people can be assigned to it,
The algorithm loops over the MSOAs in the regions, assigning people to households in the following order:
- HRP. This is the key link between people and households. We rely heavily on distributions from census microdata that link the HRP characteristics with those of other members of the household.
- partners of HRPs are then sampled for the relevant households.
- children are then sampled
- multi-person households
- communal establishments
At this point many households will be fully assigned, but there will generally be unassigned adults and children in the population. They are assigned to those households that are not already full.
This process is repeated for each MSOA in the region.
HPC facilities are necessary to run a country-wide simulation in any reasonable timeframe (for assignment at least). The examples below have been run on the ARC3 environment, part of the High Performance Computing facilities at the University of Leeds, UK.
The scripts should be relatively easy to modify to run on other clusters supporting SGE.
Run countrywide, using the default configuration
$ qsub ./pbatch.sh config/ssm_default.json
The SSM algorithm runs sufficiently quickly that each individual process computes 10 LADs consecutively.
Run countrywide, using the default configuration
$ qsub ./hbatch.sh config/ssm_h_default.json
The SSM algorithm runs sufficiently quickly that each individual process computes 10 LADs consecutively.
Run countrywide, using the default configuration
$ qsub ./abatch.sh config/ass_default.json
Run a single LAD (Newcastle):
$ qsub ./asingle.sh config/ass_default.json E08000021
The SSM algorithm runs sufficiently slowly that each LAD requires a dedicated process.