This section describes the validation of the population synthesis procedure. The validation of the procedure involves assessing the convergence of the synthesis routine by comparing output synthetic population with the controls. Table 2-1 summarizes the comparison. At a regional level, for each control, the table shows the total number of records (household/person) desired by the control, the total number of records synthesized, the percentage difference between the synthesized totals and the control totals and the percentage root mean squared error and the standard deviation of the absolute error.
Controls | Observed | Predicted | % Diff | % RMSE | STDEV |
---|---|---|---|---|---|
MAZ Controls | |||||
Total number of households | 2,166,392 | 2,166,392 | 0.00 | 0.00 | 0.00 |
Household Size = 1 | 571,160 | 571,257 | 0.02 | 4.67 | 4.89 |
Household Size = 2 | 654,065 | 654,090 | 0.00 | 2.79 | 10.73 |
Household Size = 3 | 393,943 | 393,908 | -0.01 | 3.90 | 19.04 |
Household Size = 4 | 321,203 | 321,124 | -0.02 | 3.75 | 15.77 |
Household Size = 5 | 135,869 | 135,867 | 0.00 | 8.77 | 12.12 |
Household Size = 6+ | 90,152 | 90,146 | -0.01 | 19.26 | 44.05 |
Household Income: Less than $24,999 | 322,457 | 322,413 | -0.01 | 16.39 | 7.04 |
Household Income: $25,000 to $59,999 | 702,169 | 701,845 | -0.05 | 4.54 | 3.71 |
Household Income: $60,000 to $119,999 | 757,575 | 757,720 | 0.02 | 3.02 | 5.94 |
Household Income: $120,000 or more | 384,191 | 384,414 | 0.06 | 4.43 | 16.75 |
Number of workers = 0 | 380,875 | 390,215 | 2.45 | 39.16 | 13.83 |
Number of workers = 1 | 910,383 | 913,223 | 0.31 | 7.27 | 6.87 |
Number of workers = 2 | 735,308 | 723,652 | -1.59 | 23.25 | 10.88 |
Number of workers = 3+ | 139,826 | 139,302 | -0.37 | 14.91 | 8.16 |
TAZ Controls | |||||
Workers with WhiteCollar Occupation | 1,294,665 | 1,318,477 | 1.84 | 3.60 | 5.00 |
Workers with Services Occupation | 287,661 | 293,414 | 2.00 | 3.96 | 5.07 |
Workers with Health Occupation | 161,507 | 164,501 | 1.85 | 3.57 | 5.03 |
Workers with Retail Occupation | 487,140 | 496,732 | 1.97 | 3.89 | 5.08 |
Workers with BlueCollar Occupation | 530,507 | 541,463 | 2.07 | 3.97 | 5.06 |
Meta Controls | |||||
Age 0 - 14 years | 1,200,144 | 1,214,458 | 1.19 | 1.67 | 0.94 |
Age 15 to 24 years | 744,999 | 755,120 | 1.36 | 1.87 | 1.09 |
Age 25 to 34 years | 787,634 | 797,503 | 1.25 | 1.75 | 1.09 |
Age 35 to 44 years | 818,545 | 829,270 | 1.31 | 1.79 | 1.08 |
Age 45 to 54 years | 821,192 | 832,828 | 1.42 | 1.94 | 1.15 |
Age 55 to 64 years | 646,473 | 655,176 | 1.35 | 1.83 | 1.11 |
Age 65 to 74 years | 396,256 | 401,548 | 1.34 | 1.77 | 1.11 |
Age 75 to 84 years | 162,502 | 164,667 | 1.33 | 1.75 | 1.08 |
Age 85 plus | 56,294 | 56,999 | 1.25 | 1.78 | 1.09 |
The results in Table 2-1 indicate that the population synthesizer does reasonably well in matching the controls overall, as can be observed by the low absolute and percentage differences across all controls. Clearly, for all three geographies, the synthesizer is able to match the controls very well as can be inferred from the low RMSE and the standard deviations. However, there is more variation at any geography in matching controls that represent a relatively small market (such as household size 6 plus or workers = 0). This is simply an artifact of the multiple dimensions that are being constrained coupled with the lack of records in the seed data for these market segments, leading to somewhat more difficulty in accurately sampling the correct number of records that match all of the constraints for specific geographies.
Finally, Figure 2-1 visualizes the mean bias and +/- one standard deviation for the controls – the figure reinforces the results discussed just now.
Figure 3-2 and Figure 3-3 below shows the scatterplot for two different controls: one for the total households which is synthesized with very low bias and RMSE and for the number of households with size 6 or more, which is synthesized with higher bias and RMSE. Each point in the scatter plot represent the control and synthesized numbers for the corresponding control. The scatter plot again confirms the earlier conclusion about the convergence for these two controls.