Population synthesis is a method for creating a fully-enumerated
population of the ARC region (persons and households) based on a
population sample. The ARC population synthesizer was developed to be a
flexible tool for creating synthetic populations for activity-based
modeling. It takes as an input Census data - specifically the Public Use
Microdata Sample (PUMS) – and zonal-level and regional marginal
distributions of households by various characteristics. These
distributions are used as controls or targets which the synthetic
population attempts to match.
The person and household controls may be specified at three main levels
of spatial aggregation - microzones (MAZ), traffic analysis zones (TAZ),
and district. For ARC, these aggregations correspond to TAZs, LUZ
(PECAS) zones, and County groups, respectively. Controls at the district
level are also known as META-controls. Some counties were grouped
together to form the districts so that each META-geography unit is at
least as big as a PUMA.
The basic steps of the population synthesizer are described below:
MAZ level control data is aggregated to the Census PUMA level.
PUMS household record weights are list balanced to match PUMA controls.
Weighted households are aggregated by PUMA to META level control categories.
PUMA level totals are factored to match META level control totals.
Factored PUMA level META controls are appended to the original PUMA controls.
Final PUMA household record weights are determined by list balancing to match expanded set of PUMA controls.
After list balancing for the PUMA, a linear programming solver is used to discretize the fractional weights.
By PUMA, households are allocated to TAZs within the PUMA.
The allocation procedure involves list balancing of the PUMA household records to match TAZ control totals (aggregated from MAZ controls).
TAZs are processed in order of number of households in TAZ, from smallest to largest.
Initial weights for the household records are the integer weights determined from final PUMA level balancing.
Only the household records with non-zero initial weights are used for balancing.
After list balancing for the TAZ, a linear programming solver is used to discretize the fractional weights. The integer weights for all household records that were determined match TAZ controls.
The PUMA level initial weights are reduced by the final TAZ integer weights.
After all TAZs are allocated households, MAZs are allocated households using an identical procedure to TAZ allocation except the set of PUMS records are the non-zero weight records for each TAZ.
The final output table is a table for each MAZ of household records with final integer weights that sum to the number of households in the MAZ.
The algorithm is illustrated in Figure 2-1 below.
Figure 2-1. PopSyn Flow Chart
The ARC population synthesis uses the controls summarized in Table 2-1 below. The seed household and person population were obtained from the 2007-2011 5-Year PUMS datasets. The outputs from the population synthesis process are lists of synthetic households and persons that reside in the 21-county ARC region, as shown in Table 2-2 and Table 2-3.
Table 2-1 ARC Population Synthesis Controls
| Control | Categories | Geography | Data Source |
|---|---|---|---|
| Number of households | N/A | Region | ARC socio-economic forecast |
| Number of non-group quarter households | N/A | Region | ARC socio-economic forecast |
| Number of non-group quarter households by household size | 1, 2, 3, 4, 5, 6+ | MAZ (ARC TAZ) | ARC socio-economic forecast |
| Number of non-group quarter households households by income (2010$) | 0-22675, 22675-54588, 54588-109175, 109175+ | MAZ (ARC TAZ) | ARC socio-economic forecast |
| Number of non-group quarter households households by workers | 0, 1, 2, 3+ | MAZ (ARC TAZ) | CTPP 2010 |
| Number of major university students | Living on campus, living off Campus | MAZ (ARC TAZ) | IPEDS Fall 2019 data |
| Number of persons by age | 0-14, 15-24, 25-34, 35-44, 45-54, 55-64,65-74, 75-84, 85+ | District (ARC County group) | ARC socio-economic forecast |
| Number of persons by occupation | CL01BlueCollar, CL02Health, CL03RetailandFood, CL04Services, CL05WhiteCollar | TAZ (ARC PECAS zone) | PECAS model, CTPP PECAS zone level worker distribution, and total regional employment |
Table 2-2 Synthetic Population Household Table in Expanded Form
| FIELD | Description |
|---|---|
| TEMPID | Unexpanded household ID |
| DISTRICT2 | District code (ARC County) in which household is located |
| PUMA | PUMA code for household record |
| TAZ | TAZ (ARC PECAS zone) code in which household is located |
| MAZ | MAZ (ARC TAZ) code in which household is located |
| NEWWEIGHT | Housing Weight |
| FINALPUMSID | HH ID generated during population synthesis |
| FINALWEIGHT | HH weight generated during population synthesis |
| SERIALNO | Unique housing PUMS record identifier |
| NP | Number of person records following this housing record |
| NWRKRS_ESR | Number of workers in the household |
| HHINC | Household income (past 12 months) |
| HHINCADJ | Adjusted household income |
| ADJINC | Adjustment factor for income and earnings dollar amounts in 2011 dollars |
| VEH | Vehicles available (1 ton or less) |
| HHT | Household/family type |
| BID | Units in structure |
| TYPE | Type of unit |
| FAMILYTYPE | Household/family type |
| N | Expansion ID of household |
| NUM_STUDS | Students |
| HHINCADJ_NGQ | Adjusted non-group quarters household income |
| N_NGQ | Expansion ID of non-group quarters household |
| NWRKRS_ESR_NGQ | Number of workers in non-group quarters household |
| HHID | Unique household ID |
Table 2-3 Synthetic Population Person Table in Expanded Form
| FIELD | Description |
|---|---|
| TEMPID | Unexpanded household ID |
| DISTRICT2 | District code (ARC County) in which household is located |
| PUMA | PUMA code for household record |
| TAZ | TAZ (ARC PECAS zone) code in which household is located |
| MAZ | MAZ (ARC TAZ) code in which household is located |
| NEWWEIGHT | Housing weight |
| FINALPUMSID | Person ID generated during population synthesis |
| FINALWEIGHT | Person weight generated during population synthesis |
| SPORDER | Person number in household |
| AGEP | Age (99 is 99+) |
| EMPLOYED | Is person employed |
| PECAS_OCC | PECAS Occupation code |
| SEX | Sex (1 = male, 2 = female) |
| ESR | Employment status recode |
| WKW | Weeks worked during past 12 months |
| WKHP | Usual hours worked per week past 12 months |
| MIL | Military service |
| SCHG | Grade level attending |
| SCHL | Educational attainment |
| INDP02 | Industry 2002 recode |
| INDP07 | Industry 2007 recode |
| OCCP02 | Occupation 2002 recode |
| OCCP10 | Occupation 2010 recode |
| SOC | SOC major occupation |
| TYPE | Type of living unit |
| MAJUNIV | Major university student |
| MAJDORM | Major university student living in a campus dorm |
| OTHUNIV | Non-major university student |
| OTHDORM | Non-major university student living in a campus dorm |
| MAJNONDORM | Major university student not living in a campus dorm |
| CAMPUSTAZ | Major university student campus TAZ |
| CAMPEMPLIV | Major university student campus living unit type and employment status |
| EMP_STATUS | Major university student employment status |
| AGEP_NGQ | Age (non-group quarters) |
| PECASOCC_NGQ | PECAS Occupation code (non-group quarters) |
| N | Expansion ID of household |
| PERID | Unique person ID |
| HHID | Unique household ID |