The population synthesizer software, called PopSyn, is a Java-based program that uses Microsoft’s free MySQL Server. It requires Census PUMS household and person data to be read from the database. The final synthetic population calculated by the software is written into the database as well. All the data tables are maintained in one database called ARCPopSyn. The database tables are described in Table 3-1. The details of each table can be found in the Appendix.
Table 3-1 PopSyn Database Tables
| Name | Description |
|---|---|
| psam_h13 | 2007-2011 ACS 5-year PUMS Household table for the region |
| psam_p13 | 2007-2011 ACS 5-year PUMS Person table for the region |
| hhtable | Household input table processed from PUMS household table |
| perstable | Person input table processed from PUMS household table |
| control_totals_maz | Table with control totals at MAZ level |
| control_totals_taz | Table with control totals at TAZ level |
| control_totals_meta | Table with meta-control totals at meta level |
| households | Synthetic population households output table |
| persons | Synthetic population persons output table |
The first step to setup the database is to load all the Census ACS
tables. The user should download the 2007-2011 5 year ACS household and
person csv data file for Georgia:
The user also needs the file socPECASCwlk.csv, which is a crosswalk between the SOC and PECAS occupation categories, in order to prepare the PUMS tables. Once the user has extracted the PUMS files from the archive, the user must run the PUMSTableCreation.sql script to create the hhtable and perstable tables. This script only needs to be run once. This script depends on the CSV file names specified in the csv_filenames table created by the run batch file described later. It can also be run with the run batch file.
In order to setup the control tables, the following input files are required:
The ControlsTableCreation.sql script run by the run batch file setups up the control tables: control_totals_maz, control_total_taz, and control_totals_meta.
For the ControlsTableCreation.sql script to properly load the household information, it needs to be updated with the correct name of the Hshld{YY}g.txt file. For example, if the year 2020 is being run, the file name should read “hshld20g.txt” in the SQL script. This should already be done, but it can often be a troubleshooting issue.
The new version of the PopSyn matches both household and person distributions, as well as controls at multiple levels of geography. There are three main levels of geography at which controls can be set - microzones (MAZ), traffic analysis zones (TAZ), and district. For ARC, these correspond to TAZs, PECAS zones, and County. Controls at the district-level (ARC County) are known as meta-controls. In addition, PUMAs must nest within meta-geographies. As a result, some of the counties were grouped since each meta-geography must be at least as big as a PUMA. The controls being used for ARC are summarized in Table 3-2 below.
The raw PUMS data needs to be processed to generate the required control attributes. The household and person datasets are read into a temporary table. All group quarter (GQ) records are dropped from the datasets. Using employment status attributes in the person database, the number of workers is assigned to each household. The SOC code is extracted for each person and socPECASCwlk.csv crosswalk is used to generate the occupation category.
The ARC PopSyn uses the persons by occupation at the PECAS zone level as a control. In order to map that data into PopSyn, the PECAS total wages by occupation by PECAS zone are divided by the average wage by occupation from Census 2000 to create the total persons by occupation by PECAS zone. The occupation totals thus generated are scaled to match the number of households by numbers of worker controls defined earlier for the MAZ geography.
Table 3-2 ARC Population Synthesis Controls
| Control | Categories | Geography | Data Source |
|---|---|---|---|
| Number of households | N/A | Region | ARC socio-economic forecast |
| Number of non-group quarter households | N/A | Region | ARC socio-economic forecast |
| Number of non-group quarter households by household size | 1, 2, 3, 4, 5, 6+ | MAZ (ARC TAZ) | ARC socio-economic forecast |
| Number of non-group quarter households households by income (2010$) | 0-22675, 22675-54588, 54588-109175, 109175+ | MAZ (ARC TAZ) | ARC socio-economic forecast |
| Number of non-group quarter households households by workers | 0, 1, 2, 3+ | MAZ (ARC TAZ) | CTPP 2010 |
| Number of major university students | Living on campus, living off Campus | MAZ (ARC TAZ) | IPEDS Fall 2019 data |
| Number of persons by age | 0-14, 15-24, 25-34, 35-44, 45-54, 55-64,65-74, 75-84, 85+ | District (ARC County group) | ARC socio-economic forecast |
| Number of persons by occupation | CL01BlueCollar, CL02Health, CL03RetailandFood, CL04Services, CL05WhiteCollar | TAZ (ARC PECAS zone) | PECAS model, CTPP PECAS zone level worker distribution, and total regional employment |
PopSyn is configured to run using the batch file runpopsyn.bat. The files that are critical for the population synthesizer are all housed in the directory runtime. Both the batch file and the runtime directory should also be present in the working directory at the time of program execution. A brief description of the file directory setup follows.
Table 3-3 Working Directory Contents
| File | Description |
|---|---|
| /data | Sub-directory with all input data files |
| /outputs | Sub-directory with log files and scenario specific outputs |
| /runtime | Sub-directory with all critical files |
| /scripts | Sub-directory with all scripts |
| /university_model | Sub-directory with major university model files |
| runPopSyn.bat | Batch file for running PopSyn |
Table 3-4 Data Directory Contents
| File | Description |
|---|---|
| ageScalingFactors.csv | Age scaling factors by age bin by county group |
| avgWagePUMS00.csv | Average wage by occupation and household type as per 2000 PUMS |
| EMP.csv | Number of jobs by TAZ |
| geographicCwalk.csv | Geographic correspondence between MAZ, TAZ, PUMA (Census 2000 definitions) and META geographies |
| HHByNumWorker.csv | Number of households with workers between 0 and 4 for each MAZ |
| Hshld{YR}g.txt | Number of households segmented by income and household size for each TAZ |
| LaborMakeAndUse.csv | Amount of labor make/use by occupation by PECAS zone |
| newHouseholdTable.csv | Final households table including from the major university model |
| newPersonTable.csv | Final PUMS persons table including from the major university model |
| PecasZoneWorkerControls.csv | Number of workers by PECAS zone |
| persByOccpCountyDist.csv | Number of workers by sector by county group |
| personsByAge.csv | Persons by age group for each county in the model area |
| socPECASCwlk.csv | Crosswalk between the SOC and PECAS occupation categories |
| UniversityMazPopsynControls.csv | Number of major university students campus living and employment locations by TAZ from the major university model |
Table 3-5 Outputs Directory Contents
| File | Description |
|---|---|
| control_totals_maz.csv | Control totals at the MAZ level CSV file |
| control_totals_meta.csv | Control totals for population bins at the District level CSV file |
| control_totals_taz.csv | Control totals at the TAZ level CSV file |
| event.log | Event log file containing details of the latest run |
| households.csv | Synthesized household CSV file |
| persons.csv | Synthesized person CSV file |
| runpopsyn.log | Program log file containing details of the latest run |
Table 3-6 Runtime Directory Contents
| File | Description |
|---|---|
| /config | Sub-directory with configuration files |
| /lib | Sub-directory with external libraries |
| common-base.jar | Java archive contains common modeling framework (CMF) code |
| popsyn3Unsigned.jar | Java archive containing PopSyn source code |
Table 3-7 Runtime/Config Directory Contents
| File | Description |
|---|---|
| log4j.xml | Configuration file for java logging library |
| jppf-clientLocal.properties | JPPF configuration file |
| settings.xml | PopSyn configuration file |
Table 3-8 Runtime/Lib Directory Contents
| File | Description |
|---|---|
| /JPFF-3.2.2 | JPFF libraries for distributed setup (not yet implemented) |
| /sed-4.2.1-bin | sed is a stream editor to perform basic text transformations on an input stream |
| /sed-4.2.1-dep | Dependencies of sed stream editor |
| com.google.ortools.linearsolver.jar | Linear programming solver Java libraries |
| jnilinearsolver.dll | Linear programming solver dependency |
| sqljdbc_auth.dll | SQL Server “Windows authentication” dependency |
| sqljdbc4.jar | Microsoft Java Database Connectivity driver |
Table 3-9 Scripts Directory Contents
| File | Description |
|---|---|
| ControlsTableCreation.sql | Import and process ARC control data |
| outputs.sql | Create expanded person and household tables |
| PUMSTableCreation.sql | Import PUMS data |
| PUMSTableProcessing.sql | Process PUMS data |
Table 3-10 University Model Contents
| File | Description |
|---|---|
| /code | Sub-directory of code necessary for the major university model |
| /input | Sub-directory of major university model inputs |
| /intermediate | Sub-directory of major university model intermediate files |
| /output | Sub-directory of major university model output files |
| runUniversityModel.bat | Batch file to run the major university model |
Table 3-11 University_Model/Code
| File | Description |
|---|---|
| /config | Sub-directory of major university model config files |
| /jarFiles | Sub-directory of major university model and CT-RAMP jar files |
| /lib | Sub-directory including java library files |
| /params | Sub-directory of major university model parameters |
| /R-4.0.2 | Sub-directory including R files |
| /runScripts | Sub-directory of major university model scripts |
Table 3-12 University_Model/Input
| File | Description |
|---|---|
| accessibility.csv | TAZ level automobile, transit or walk accessibility during peak or off-peak periods to retail or all employment |
| asuStudents.csv | Arizona State University (ASU) student survey expanded to the major university onsite enrollment input |
| expansionControls2019.csv | Control of major university enrollments by campus TAZ, gender, age and grades |
| sov_free_AM.tpp | SOV non-toll highway skims by AM period |
| ss11hga.csv | PUMS household file |
| ss11pga.csv | PUMS person file |
| ZoneData.csv | TAZ-level zonal data |
Table 3-13 University_Model/Intermediate
| File | |
|---|---|
| newHouseholdTable.csv | |
| newPersonTable.csv | |
| popsyn_hh_11.csv | |
| popsyn_p_11.csv | |
| studentSurvey_processed.csv | |
| universityMazControls.csv | |
| universityModelChoices.csv |
Table 3-14 University_Model/Output
| File | Description |
|---|---|
| newHouseholdTable.csv | Final households table including from the major university model |
| newPersonTable.csv | Final PUMS persons table including from the major university model |
| UniversityMazPopsynControls.csv | Number of major university students campus living and employment locations by TAZ from the major university model |
Using the file settings.xml, the user can configure the database connection settings as well as specify database attributes that are to be used for balancing the controls. The settings available to the user are discussed below:
Table 3-15 Database Connection Setting ({database} attribute)
| ATTRIBUTE | DESCRIPTION |
|---|---|
| <server> | Name of the server where the database is stored and the TCP/IP port number for connecting the client. For MySQL Server,the default port is 3306. |
| <type> | Specifies the database engine being used. Currently ARC PopSyn is implemented in MySQL |
| <user> | Username for logging into the database server. |
| <password> | Password for the user specified above. |
| <dbName> | Name of the database where all the relevant tables are stored. |
XML Instance
<database>
<server>localhost</server>
<port>3306</port>
<type>MYSQL</type>
<user>root</user>
<password>root</password>
<dbName>ARCPopSyn</dbName>
</database>
Table 3-16 PUMS Table Setting ({pumsData} attribute)
| ATTRIBUTE | DESCRIPTION |
|---|---|
| <idField> | User generated unique ID for households in the PUMS table |
| <pumaFieldName> | The attribute in control tables which identifies the PUMA |
| <metaFieldName> | The attribute in control tables which identifies the meta geography |
| <tazFieldName> | The attribute in control table which identifies the TAZ |
| <mazFieldName> | The attribute in control table which identifies the MAZ |
| <weightField> | Weight field to be used in the PUMS tables as initial weights |
| <hhTable> | Name of the processed PUMS household table |
| <pumsHhIdField> | PUMS household ID field name |
| <persTable> | Name of the processed PUMS persons table |
| <pumsHhTable> | Name of the processed PUMS household table |
| <pumsPersTable> | Name of the processed PUMS persons table |
| <maxExpansionFactor> | Maximum household expansion factor weight setting |
| <synpopOutputHhTableName> | Name of synthesized households table (synpop_hh) |
| <synpopOutputPersTableName> | Name of synthesized persons table (synpop_per) |
| <outputHhAttributes> | PUMS household attributes to write out for the synthesized household |
| <outputPersAttributes> | PUMS Person Attributes to write out for persons in synthesized households |
XML Instance
<pumsData>
<idField>hhnum</idField>
<pumaFieldName>PUMA</pumaFieldName>
<metaFieldName>DISTRICT2</metaFieldName>
<tazFieldName>TAZ</tazFieldName>
<mazFieldName>MAZ</mazFieldName>
<weightField>WGTP</weightField>
<hhTable>hhtable</hhTable>
<persTable>perstable</persTable>
<pumsHhTable>hhtable</pumsHhTable>
<pumsHhIdField>hhnum</pumsHhIdField>
<pumsPersTable>perstable</pumsPersTable>
<maxExpansionFactor>15</maxExpansionFactor>
<synpopOutputHhTableName>synpop_hh</synpopOutputHhTableName>
<synpopOutputPersTableName>synpop_person</synpopOutputPersTableName>
<outputHhAttributes>serialno, np, nwrkrs_esr, hincp, hhincAdj, adjinc, veh, hht, bld, type</outputHhAttributes>
<outputPersAttributes>sporder, agep, employed, pecasOcc, sex, esr, wkw, wkhp, mil, schg, schl, indp02, indp07, occp02, occp10</outputPersAttributes>
</pumsData>
Table 3-17 Control Table Settings ([maz/taz/meta]ControlsTable attributes)
| ATTRIBUTE | DESCRIPTION |
|---|---|
| <table_name> | Name of the table where the [MAZ/TAZ/META] level controls are stored. |
| <id_field_name> | The attribute in control table that identifies [MAZ/TAZ/META] |
| <aggregation_level> | The geographic level to aggregate the data to. MAZ = MAZ, TAZ = TAZ, Meta = PUMA |
XML Instance
<mazControlsTable>
<mazTable id="1">
<table_name>control_totals_maz</table_name>
<id_field_name>MAZ</id_field_name>
<aggregation_level>MAZ</aggregation_level>
</mazTable>
</mazControlsTable>
Table 3-18 Specifying Controls ({target} attribute)
| ATTRIBUTE | DESCRIPTION |
|---|---|
| <marginals> | |
| <id> | The control ID. IDs are from 0 to the number of controls. |
| <description> | The description of the control being configured. |
| <totalHouseholdsControl> | This attribute needs to be made “true” when configuring the file for controlling the total households at the regional level. For all other controls this attribute does not appear in the configuration. |
| <controlType> | * simple: Comparison to the control total is just a simple check to see if the number of synthesized records matches the control. All household controls are simple controls. |
| * count: Comparison to the control totals involves counting up the number of matching person records from synthesized households and ensuring consistency with the control totals. All person controls are count controls. | |
| <geographyType> | The geography at which the control has been specified. [MAZ/TAZ/META] |
| <table> | The seed table |
| <constraint> | |
| <importance> | Weights to adjust the importance of the control |
| <field> | Attribute in the PUMS table that specifies the initial weights. |
| <controlField> | The attribute in the PUMS table that corresponds to the control being set. |
| <type> | Depending on nature of comparison being carried out a household/person record will qualify into a control category. The comparison types are: |
| * interval: If the values in the controlField needs to be compared to a range in order to qualify into the control category. | |
| * equality: If the values in the controlField needs to be equal to a particular value to qualify into the control category. | |
| If <type> is interval | |
| <lo_value> | Lower bound of the range that defines the control category |
| <lo_type> | * closed: if the range includes lo_value |
| * open: if the range does not include lo_value | |
| <hi_value> | Upper bound of the range that defines the control category |
| <hi_type> | * closed: if the range includes hi_value |
| * open: if the range does not include hi_value | |
| If <type> is equality | |
| <value> | Value that defines the control category |
XML Instance
<!-- Defining a 2 category control for Number of Households by Persons per Household at MAZ level -->
<!-- Category 1: One person household <type> equality-->
<!-- Category 2: 2+ person household <type> interval-->
<target>
<!-- Define conditions for the 6 persons per household controls for households -->
<marginals>
<id>1</id>
<description>MAZ Controls: Number of Households by Persons per Households</description>
<geographyType>MAZ</geographyType>
<controlType>simple</controlType>
<table>hhtable</table>
<constraint id="1">
<importance>100000</importance>
<field>NP</field>
<controlField>HHSIZE1</controlField>
<type>equality</type>
<value>1</value>
</constraint>
<constraint id="2">
<importance>100000</importance>
<field>NP</field>
<controlField>HHSIZE2</controlField>
<type>equality</type>
<value>2</value>
</constraint>
</marginals>
</target>
PopSyn runs using Microsoft’s free MySQL software. Download and install the MySQL Community Edition install from https://dev.mysql.com/downloads/windows/installer/5.7.html. Note that there are two versions of the software. Select the mysql-install-community-5.7.22 (the larger one) or newer versions. Run the .msi installer file and agree to the terms of the open-source license. The installer will then step through a number of pages allowing the user to specify the installation. In general, select the default or classic options. If asked what version to install, select x64 on a 64-bit machine or x86 for a 32-bit machine.
A MySQL database must be created prior to running PopSyn. Open the command line and the following commands from Table 3.19 below can be execuated.
Table 3-19 PopSyn Execution
| Commands | Action Description |
|---|---|
| > mysql -uroot -proot | Log into MySQL using username and password; these are both “root” by default unless the username or password is changed during installation of MySQL |
| mysql> create database arcpopsyn; | Creates empty database called “arcpopsyn” |
| mysql> exit | Exits the MySQL environment without leaving the command prompt window |
Note: if the default username “root” and/or default password “root” is changed during the installation of MySQL Community Server, be sure to replace those characters in the login command. However, it is not recommended that you choose an alternative name for the database. The name “arcpopsyn” is coded into the SQL scripts that run the model and would need to be manually changed to an alternative name.
You do not have to clear the arcpopsyn database between multiple PopSyn runs, however a new run will replace the data and the previous data will not be recoverable if overwritten. The households, persons, and control totals tables will be exported as CSV files after each run as part of the SQL scripts, and it is unlikely that any other data will need to be extracted before being overwritten.
Additional commands that are useful to navigating the MySQL databases:
Table 3-20 PopSyn Additional Commands
| Additional Commands | Action Description |
|---|---|
| mysql> show databases; | Displays a list of all databases currently in the local MySQL server |
| mysql> use [database name here]; | Switches to a specific database for further exploration |
| mysql> show tables; | Displays a list of all the tables currently stored in active database |
| mysql> drop database [database name here]; | Deletes the given database and all associated tables entirely |
PopSyn is run with the runpopsyn.bat file. The batch file runs the major university model which is embedded within PopSyn. The major university model identifies both group quarters and non-group quarters households with major university students in the input PUMS data, expands them to include all possible combinations of students campus choice and employment and sends out the agumented seed PUMS households and populations to the PopSyn Input folder along with university controls. The batch file runs PopSyn to create the household and persons CSV files for use in the model, and creates the control totals CSV files for reference. The batch file can also be used to import the PUMS data. PopSyn needs to be run from a locally-stored folder, such as the desktop or the C: drive.
To run a new scenario, including future year scenarios, the following PopSyn input files may need to be adjusted:
In addition, the following major university model input files may need to be adjusted:
At the end of the run, the synthesized household and person tables described in the Appendix are saved as CSV files in the output directory. The Control Totals tables are also exported and stored in the output directory for reference. The Control Totals tables are not necessary inputs for the ABM.