The population synthesizer software, called PopSyn, is a Java-based program that uses Microsoft’s free MySQL Server. It requires Census PUMS household and person data to be read from the database. The final synthetic population calculated by the software is written into the database as well. All the data tables are maintained in one database called ARCPopSyn. The database tables are described in Table 3-1. The details of each table can be found in the Appendix.
Table 3-1 PopSyn Database Tables
Name | Description |
---|---|
psam_h13 | 2007-2011 ACS 5-year PUMS Household table for the region |
psam_p13 | 2007-2011 ACS 5-year PUMS Person table for the region |
hhtable | HH input table processed from PUMS HH table |
perstable | Person input table processed from PUMS HH table |
control_totals_maz | Table with control totals at MAZ level |
control_totals_taz | Table with control totals at TAZ level |
control_totals_meta | Table with meta-control totals at meta level |
households | Synthetic population households output table |
persons | Synthetic population persons output table |
The first step to setup the database is to load all the Census ACS tables. The user should download the 2007-2011 5 year ACS household and person csv data file for Georgia:
* Household file: http://www2.census.gov/acs2011_5yr/pums/csv_hga.zip
* Person file: http://www2.census.gov/acs2011_5yr/pums/csv_pga.zip
The user also needs the file socPECASCwlk.csv, which is a crosswalk between the SOC and PECAS occupation categories, in order to prepare the PUMS tables. Once the user has extracted the PUMS files from the archive, the user must run the PUMSTableCreation.sql script to create the hhtable and perstable tables. This script only needs to be run once. This script depends on the CSV file names specified in the csv_filenames table created by the run batch file described later. It can also be run with the run batch file.
In order to setup the control tables, the following input files are required:
The ControlsTableCreation.sql script run by the run batch file setups up the control tables: control_totals_maz, control_total_taz, and control_totals_meta.
For the ControlsTableCreation.sql script to properly load the household information, it needs to be updated with the correct name of the Hshld[year]g.txt file. For example, if the year 2020 is being run, the file name should read “Hshld20g.txt” in the SQL script. This should already be done, but it can often be a troubleshooting issue.
The new version of the PopSyn matches both household and person distributions, as well as controls at multiple levels of geography. There are three main levels of geography at which controls can be set - microzones (MAZ), traffic analysis zones (TAZ), and district. For ARC, these correspond to TAZs, PECAS zones, and County. Controls at the district-level (ARC County) are known as meta-controls. In addition, PUMAs must nest within meta-geographies. As a result, some of the counties were grouped since each meta-geography must be at least as big as a PUMA. The controls being used for ARC are summarized in Table 3-2 below.
The raw PUMS data needs to be processed to generate the required control attributes. The household and person datasets are read into a temporary table. All group quarter (GQ) records are dropped from the datasets. Using employment status attributes in the person database, the number of workers is assigned to each household. The SOC code is extracted for each person and socPECASCwlk.csv crosswalk is used to generate the occupation category.
The ARC PopSyn uses the persons by occupation at the PECAS zone level as a control. In order to map that data into PopSyn, the PECAS total wages by occupation by PECAS zone are divided by the average wage by occupation from Census 2000 to create the total persons by occupation by PECAS zone. The occupation totals thus generated are scaled to match the number of households by numbers of worker controls defined earlier for the MAZ geography.
Table 3-2 ARC Population Synthesis Controls
Control | Categories | Geography | Data Source |
---|---|---|---|
Number of HHs | N/A | Region | ARC socio-economic forecast |
Number of HH by income | 0-25k, 25k-60k, | MAZ (ARC TAZ) | ARC socio-economic forecast |
60k-120k, 120k+ | |||
Number of HH by HH size | 1,2,3,4,5,6+ | MAZ (ARC TAZ) | ARC socio-economic forecast |
Number of HH by workers | 0,1,2,3+ | MAZ (ARC TAZ) | ARC workers per household shares from trip-based model |
Number of persons by age | 0-14, 15-24, 25-34, 35-44, 45-54, 55-64, | District (ARC County group) | ARC socio-economic forecast |
65-74, 75-84, 85+ | |||
Number of persons by occupation | CL23WhiteCollar, | TAZ (ARC PECAS zone) | PECAS model |
CL24Services, | |||
CL25Health, | |||
CL26Retail, | |||
CL27BlueCollar |
PopSyn is configured to run using the batch file runpopsyn.bat. The files that are critical for the population synthesizer are all housed in the directory runtime. Both the batch file and the runtime directory should also be present in the working directory at the time of program execution. A brief description of the file directory setup follows.
Table 3-3 Working Directory Contents
File | Description |
---|---|
runPopSyn.bat | Batch file for running PopSyn |
/runtime | Sub-directory with all critical files |
/outputs | Sub-directory with log file and scenario specific output |
Table 3-4 Runtime Directory Contents
File | Description |
---|---|
popsyn3Unsigned.jar | Java archive containing PopSyn source code |
common-base.jar | Java archive contains common modeling framework (CMF) code |
/lib | Sub-directory with external libraries |
/config | Sub-directory with configuration files |
Table 3-5 Outputs Directory Contents
File | Description |
---|---|
event.log | Event log file containing details of the latest run |
persons.csv | Synthesized person CSV file |
households.csv | Synthesized household CSV file |
control_totals_maz.csv | Control totals at the MAZ level CSV file |
control_totals_meta.csv | Control totals for population bins at the District level CSV file |
control_totals_taz.csv | Control totals at the TAZ level CSV file |
Table 3-6 Scripts Directory Contents
File | Description |
---|---|
ControlsTableCreation.sql | Import and process ARC control data |
PUMSTableCreation.sql | Import PUMS data |
PUMSTableProcessing.sql | Process PUMS data |
outputs.sql | Create expanded person and household tables |
Table 3-7 Lib Directory Contents
File | Description |
---|---|
sqljdbc4.jar | Microsoft Java Database Connectivity driver |
com.google.ortools.linearsolver.jar | Linear programming solver Java libraries |
sqljdbc_auth.dll | SQL Server “Windows authentication” dependency |
jnilinearsolver.dll | Linear programming solver dependency |
/JPFF-3.2.2 | JPFF libraries for distributed setup (not yet implemented) |
sed-4.2.1-bin | sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). |
sed-4.2.1-dep | Dependencies of sed stream editor |
Table 3-8 Config Directory Contents
File | Description |
---|---|
log4j.xml | Configuration file for java logging library |
settings.xml | PopSyn configuration file |
jppf-clientLocal.properties | JPPF configuration file |
Using the file settings.xml, the user can configure the database connection settings as well as specify database attributes that are to be used for balancing the controls. The settings available to the user are discussed below:
Table 3-9 Database Connection Setting ({database} attribute)
ATTRIBUTE | DESCRIPTION |
---|---|
<server> | Name of the server where the database is stored and the TCP/IP port number for connecting the client. For MySQL Server,the default port is 3306. |
<type> | Specifies the database engine being used. Currently ARC PopSyn is implemented in MySQL |
<user> | Username for logging into the database server. |
<password> | Password for the user specified above. |
<dbName> | Name of the database where all the relevant tables are stored. |
XML Instance
<database>
<server>localhost</server>
<port>3306</port>
<type>MYSQL</type>
<user>root</user>
<password>root</password>
<dbName>ARCPopSyn</dbName>
</database>
Table 3-10 PUMS Table Setting ({pumsData} attribute)
ATTRIBUTE | DESCRIPTION |
---|---|
<idField> | User generated unique ID for households in the PUMS table |
<pumaFieldName> | The attribute in control tables which identifies the PUMA |
<metaFieldName> | The attribute in control tables which identifies the meta geography |
<tazFieldName> | The attribute in control table which identifies the TAZ |
<mazFieldName> | The attribute in control table which identifies the MAZ |
<weightField> | Weight field to be used in the PUMS tables as initial weights |
<hhTable> | Name of the processed PUMS household table |
<pumsHhIdField> | PUMS HH ID field name |
<persTable> | Name of the processed PUMS persons table |
<pumsHhTable> | Name of the processed PUMS household table |
<pumsPersTable> | Name of the processed PUMS persons table |
<maxExpansionFactor> | Maximum HH expansion factor weight setting |
<synpopOutputHhTableName> | Name of synthesized households table (synpop_hh) |
<synpopOutputPersTableName> | Name of synthesized persons table (synpop_per) |
<outputHhAttributes> | PUMS HH Attributes to write out for the synthesized household |
<outputPersAttributes> | PUMS Person Attributes to write out for persons in synthesized HHs |
XML Instance
<pumsData>
<idField>hhnum</idField>
<pumaFieldName>PUMA</pumaFieldName>
<metaFieldName>DISTRICT2</metaFieldName>
<tazFieldName>TAZ</tazFieldName>
<mazFieldName>MAZ</mazFieldName>
<weightField>WGTP</weightField>
<hhTable>hhtable</hhTable>
<persTable>perstable</persTable>
<pumsHhTable>hhtable</pumsHhTable>
<pumsHhIdField>hhnum</pumsHhIdField>
<pumsPersTable>perstable</pumsPersTable>
<maxExpansionFactor>15</maxExpansionFactor>
<synpopOutputHhTableName>synpop_hh</synpopOutputHhTableName>
<synpopOutputPersTableName>synpop_person</synpopOutputPersTableName>
<outputHhAttributes>serialno, np, nwrkrs_esr, hincp, hhincAdj, adjinc, veh, hht, bld, type</outputHhAttributes>
<outputPersAttributes>sporder, agep, employed, pecasOcc, sex, esr, wkw, wkhp, mil, schg, schl, indp02, indp07, occp02, occp10</outputPersAttributes>
</pumsData>
Table 3-11 Control Table Settings ([maz/taz/meta]ControlsTable attributes)
ATTRIBUTE | DESCRIPTION |
---|---|
<table_name> | Name of the table where the [MAZ/TAZ/META] level controls are stored. |
<id_field_name> | The attribute in control table that identifies [MAZ/TAZ/META] |
<aggregation_level> | The geographic level to aggregate the data to. MAZ = MAZ, TAZ = TAZ, Meta = PUMA |
XML Instance
<mazControlsTable>
<mazTable id="1">
<table_name>control_totals_maz</table_name>
<id_field_name>MAZ</id_field_name>
<aggregation_level>MAZ</aggregation_level>
</mazTable>
</mazControlsTable>
Table 3-12 Specifying Controls ({target} attribute)
ATTRIBUTE | DESCRIPTION |
---|---|
<marginals> | |
<id> | The control ID. IDs are from 0 to the number of controls. |
<description> | The description of the control being configured. |
<totalHouseholdsControl> | This attribute needs to be made “true” when configuring the file for controlling the total households at the regional level. For all other controls this attribute does not appear in the configuration. |
<controlType> | * simple: Comparison to the control total is just a simple check to see if the number of synthesized records matches the control. All HH controls are simple controls. |
* count: Comparison to the control totals involves counting up the number of matching person records from synthesized HHs and ensuring consistency with the control totals. All person controls are count controls. | |
<geographyType> | The geography at which the control has been specified. [MAZ/TAZ/META] |
<table> | The seed table |
<constraint> | |
<importance> | Weights to adjust the importance of the control |
<field> | Attribute in the PUMS table that specifies the initial weights. |
<controlField> | The attribute in the PUMS table that corresponds to the control being set. |
<type> | Depending on nature of comparison being carried out a household/person record will qualify into a control category. The comparison types are: |
* interval: If the values in the controlField needs to be compared to a range in order to qualify into the control category. | |
* equality: If the values in the controlField needs to be equal to a particular value to qualify into the control category. | |
If <type> is interval | |
<lo_value> | Lower bound of the range that defines the control category |
<lo_type> | * closed: if the range includes lo_value |
* open: if the range does not include lo_value | |
<hi_value> | Upper bound of the range that defines the control category |
<hi_type> | * closed: if the range includes hi_value |
* open: if the range does not include hi_value | |
If <type> is equality | |
<value> | Value that defines the control category |
XML Instance
<!-- Defining a 2 category control for Number of HHs by Persons per HH at MAZ level -->
<!-- Category 1: One person household <type> equality-->
<!-- Category 2: 2+ person household <type> interval-->
<target>
<!-- Define conditions for the 6 persons per household controls for households -->
<marginals>
<id>1</id>
<description>MAZ Controls: Number of Households by Persons per Households</description>
<geographyType>MAZ</geographyType>
<controlType>simple</controlType>
<table>hhtable</table>
<constraint id="1">
<importance>100000</importance>
<field>NP</field>
<controlField>HHSIZE1</controlField>
<type>equality</type>
<value>1</value>
</constraint>
<constraint id="2">
<importance>100000</importance>
<field>NP</field>
<controlField>HHSIZE2</controlField>
<type>equality</type>
<value>2</value>
</constraint>
</marginals>
</target>
PopSyn runs using Microsoft’s free MySQL software. Download and install the MySQL Community Edition install from https://dev.mysql.com/downloads/windows/installer/5.7.html. Note that there are two versions of the software. Select the mysql-install-community-5.7.22 (the larger one) or newer versions. Run the .msi installer file and agree to the terms of the open-source license. The installer will then step through a number of pages allowing the user to specify the installation. In general, select the default or classic options. If asked what version to install, select x64 on a 64-bit machine or x86 for a 32-bit machine.
A MySQL database must be created prior to running PopSyn. Open the command line and execute the following commands:
Table 3-13 PopSyn Execution
Commands | Action Description |
---|---|
> mysql -uroot -proot | Log into MySQL using username and password; these are both “root” by default unless the username or password is changed during installation of MySQL |
mysql> create database arcpopsyn; | Creates empty database called “arcpopsyn” |
mysql> exit | Exits the MySQL environment without leaving the command prompt window |
Note: if the default username “root” and/or default password “root” is changed during the installation of MySQL Community Server, be sure to replace those characters in the login command. However, it is not recommended that you choose an alternative name for the database. The name “arcpopsyn” is coded into the SQL scripts that run the model and would need to be manually changed to an alternative name.
You do not have to clear the arcpopsyn database between multiple PopSyn runs, however a new run will replace the data and the previous data will not be recoverable if overwritten. The households, persons, and control totals tables will be exported as CSV files after each run as part of the SQL scripts, and it is unlikely that any other data will need to be extracted before being overwritten.
Additional commands that are useful to navigating the MySQL databases:
Table 3-14 PopSyn Additional Commands
Additional Commands | Action Description |
---|---|
mysql> show databases; | Displays a list of all databases currently in the local MySQL server |
mysql> use [database name here]; | Switches to a specific database for further exploration |
mysql> show tables; | Displays a list of all the tables currently stored in active database |
mysql> drop database [database name here]; | Deletes the given database and all associated tables entirely |
PopSyn is run with the runpopsyn.bat file. The batch file creates the control tables, runs PopSyn, creates the household and persons CSV files for use in the model, and creates the control totals CSV files for reference. The batch file can also be used to import the PUMS data. PopSyn needs to be run from a locally-stored folder, such as the desktop or the C: drive.
To run a new scenario, including future year scenarios, the follow input files need to be adjusted:
* EMP.csv
* Hshld{year}g.txt
* LaborMakeAndUse.csv
* PersonsByAge.csv
* avgWagePUMS00.csv (if new data is available)
At the end of the run, the synthesized household and person tables described in the Appendix are saved as CSV files in the output directory. The Control Totals tables are also exported and stored in the output directory for reference. The Control Totals tables are not necessary inputs for the ABM.