This section describes the validation of the population synthesis procedure. The validation of the procedure involves assessing the convergence of the synthesis routine by comparing output synthetic population with the controls. Table 2-1 summarizes the comparison. At a regional level, for each control, the table shows the total number of records (household/person) desired by the control, the total number of records synthesized, the percentage difference between the synthesized totals and the control totals and the percentage root mean squared error and the standard deviation of the absolute error.

Table 2-1. Validation Summary for controls
Controls Observed Predicted % Diff % RMSE STDEV
MAZ Controls
Total number of households 2,166,392 2,166,392 0.00 0.00 0.00
Household Size = 1 571,160 571,257 0.02 4.67 4.89
Household Size = 2 654,065 654,090 0.00 2.79 10.73
Household Size = 3 393,943 393,908 -0.01 3.90 19.04
Household Size = 4 321,203 321,124 -0.02 3.75 15.77
Household Size = 5 135,869 135,867 0.00 8.77 12.12
Household Size = 6+ 90,152 90,146 -0.01 19.26 44.05
Household Income: Less than $24,999 322,457 322,413 -0.01 16.39 7.04
Household Income: $25,000 to $59,999 702,169 701,845 -0.05 4.54 3.71
Household Income: $60,000 to $119,999 757,575 757,720 0.02 3.02 5.94
Household Income: $120,000 or more 384,191 384,414 0.06 4.43 16.75
Number of workers = 0 380,875 390,215 2.45 39.16 13.83
Number of workers = 1 910,383 913,223 0.31 7.27 6.87
Number of workers = 2 735,308 723,652 -1.59 23.25 10.88
Number of workers = 3+ 139,826 139,302 -0.37 14.91 8.16
TAZ Controls
Workers with WhiteCollar Occupation 1,294,665 1,318,477 1.84 3.60 5.00
Workers with Services Occupation 287,661 293,414 2.00 3.96 5.07
Workers with Health Occupation 161,507 164,501 1.85 3.57 5.03
Workers with Retail Occupation 487,140 496,732 1.97 3.89 5.08
Workers with BlueCollar Occupation 530,507 541,463 2.07 3.97 5.06
Meta Controls
Age 0 - 14 years 1,200,144 1,214,458 1.19 1.67 0.94
Age 15 to 24 years 744,999 755,120 1.36 1.87 1.09
Age 25 to 34 years 787,634 797,503 1.25 1.75 1.09
Age 35 to 44 years 818,545 829,270 1.31 1.79 1.08
Age 45 to 54 years 821,192 832,828 1.42 1.94 1.15
Age 55 to 64 years 646,473 655,176 1.35 1.83 1.11
Age 65 to 74 years 396,256 401,548 1.34 1.77 1.11
Age 75 to 84 years 162,502 164,667 1.33 1.75 1.08
Age 85 plus 56,294 56,999 1.25 1.78 1.09

The results in Table 2-1 indicate that the population synthesizer does reasonably well in matching the controls overall, as can be observed by the low absolute and percentage differences across all controls. Clearly, for all three geographies, the synthesizer is able to match the controls very well as can be inferred from the low RMSE and the standard deviations. However, there is more variation at any geography in matching controls that represent a relatively small market (such as household size 6 plus or workers = 0). This is simply an artifact of the multiple dimensions that are being constrained coupled with the lack of records in the seed data for these market segments, leading to somewhat more difficulty in accurately sampling the correct number of records that match all of the constraints for specific geographies.

Finally, Figure 2-1 visualizes the mean bias and +/- one standard deviation for the controls – the figure reinforces the results discussed just now.

Figure 3-2 and Figure 3-3 below shows the scatterplot for two different controls: one for the total households which is synthesized with very low bias and RMSE and for the number of households with size 6 or more, which is synthesized with higher bias and RMSE. Each point in the scatter plot represent the control and synthesized numbers for the corresponding control. The scatter plot again confirms the earlier conclusion about the convergence for these two controls.





Atlanta Regional Commission, 2019