Population synthesis is a method for creating a fully-enumerated population of the ARC region (persons and households) based on a population sample. The ARC population synthesizer was developed to be a flexible tool for creating synthetic populations for activity-based modeling. It takes as an input Census data - specifically the Public Use Microdata Sample (PUMS) – and zonal-level and regional marginal distributions of households by various characteristics. These distributions are used as controls or targets which the synthetic population attempts to match.
The person and household controls may be specified at three main levels of spatial aggregation - microzones (MAZ), traffic analysis zones (TAZ), and district. For ARC, these aggregations correspond to TAZs, LUZ (PECAS) zones, and County groups, respectively. Controls at the district level are also known as META-controls. Some counties were grouped together to form the districts so that each META-geography unit is at least as big as a PUMA. The basic steps of the population synthesizer are described below. The algorithm is illustrated in Figure 2-1 below.
MAZ level control data is aggregated to the Census PUMA level.
PUMS household record weights are list balanced to match PUMA controls.
Weighted households are aggregated by PUMA to META level control categories.
PUMA level totals are factored to match META level control totals.
Factored PUMA level META controls are appended to the original PUMA controls.
Final PUMA household record weights are determined by list balancing to match expanded set of PUMA controls.
After list balancing for the PUMA, a linear programming solver is used to discretize the fractional weights.
By PUMA, households are allocated to TAZs within the PUMA.
The allocation procedure involves list balancing of the PUMA household records to match TAZ control totals (aggregated from MAZ controls).
TAZs are processed in order of number of households in TAZ, from smallest to largest.
Initial weights for the household records are the integer weights determined from final PUMA level balancing.
Only the household records with non-zero initial weights are used for balancing.
After list balancing for the TAZ, a linear programming solver is used to discretize the fractional weights. The integer weights for all household records that were determined match TAZ controls.
The PUMA level initial weights are reduced by the final TAZ integer weights.
After all TAZs are allocated households, MAZs are allocated households using an identical procedure to TAZ allocation except the set of PUMS records are the non-zero weight records for each TAZ.
The final output table is a table for each MAZ of household records with final integer weights that sum to the number of households in the MAZ.
The ARC population synthesis uses the controls summarized in Table 2-1 below. The seed household and person population were obtained from the 2007-2011 5-Year PUMS datasets. The outputs from the population synthesis process are lists of synthetic households and persons that reside in the 21-county ARC region, as shown in Table 2-2 and Table 2-3.
Table 2-1 ARC Population Synthesis Controls
Control | Categories | Geography | Data Source |
---|---|---|---|
Number of HHs | N/A | Region | ARC socio-economic forecast |
Number of HH by income (2010$) | 0-25k, 25k-60k,60k-120k, 120k plus | MAZ (ARC TAZ) | ARC socio-economic forecast |
Number of HH by HH size | 1,2,3,4,5,6 plus | MAZ (ARC TAZ) | ARC socio-economic forecast |
Number of HH by workers | 0,1,2,3 plus | MAZ (ARC TAZ) | CTPP 2010 |
Number of persons by age | 0-14, 15-24, 25-34, 35-44, 45-54, 55-64,65-74, 75-84, 85 plus | District (ARC County group) | ARC socio-economic forecast |
Number of persons by occupation | CL23WhiteCollar, CL24Services, CL25Health, CL26Retail, CL27BlueCollar | TAZ (ARC PECAS zone) | PECAS model, CTPP PECAS zone level worker distribution, and total regional employment |
Table 2-2 Synthetic Population Household Table in Expanded Form
FIELD | Description |
---|---|
HHID | Unique household ID |
TEMPID | Unexpanded household ID |
DISTRICT | District code (ARC County) in which HH is located |
PUMA | PUMA code for HH record |
TAZ | TAZ (ARC PECAS zone) code in which HH is located |
MAZ | MAZ (ARC TAZ) code in which HH is located |
WGTP | Housing Weight |
FINALPUMSID | HH ID generated during population synthesis |
FINALWEIGHT | HH weight generated during population synthesis |
SERIALNO | Unique housing PUMS record identifier |
NWRKRS_ESR | Number of workers in the household |
|
Other PUMS HH fields |
Table 2-3 Synthetic Population Person Table in Expanded Form
FIELD | Description |
---|---|
HHID | Unique household ID |
PERID | Unique person ID |
TEMPID | Unexpanded household ID |
DISTRICT | District code (ARC County) in which HH is located |
PUMA | PUMA code for HH record |
TAZ | TAZ (ARC PECAS zone) code in which HH is located |
MAZ | MAZ (ARC TAZ) code in which HH is located |
WGTP | Housing Weight |
FINALPUMSID | Person ID generated during population synthesis |
FINALWEIGHT | Person weight generated during population synthesis |
SPORDER | Person number in HH |
EMPLOYED | Is person employed |
PECAS_OCC | PECAS Occupation code for this person |
|
Other PUMS Person fields |