This section describes the validation of the population synthesis procedure. The validation of the procedure involves assessing the convergence of the synthesis routine by comparing output synthetic population with the controls. Table 2-1 summarizes the comparison. At a regional level, for each control, the table shows the total number of records (household/person) desired by the control, the total number of records synthesized, the percentage difference between the synthesized totals and the control totals and the percentage root mean squared error and the standard deviation of the absolute error.
Table 2-1. Validation Summary for Controls
| Controls | Observed | Predicted | % Diff | % RMSE | |
|---|---|---|---|---|---|
| MAZ Controls | |||||
| Total number of households | 2,290,007 | 2,289,231 | - | - | |
| Total number of non group quarter households | 2,257,252 | 2,256,479 | 0.00% | 1.24% | |
| Household Size = 1 | 571,803 | 572,032 | 0.00% | 19.84% | |
| Household Size = 2 | 694,694 | 694,752 | 0.00% | 7.98% | |
| Household Size = 3 | 376,099 | 376,031 | 0.00% | 10.52% | |
| Household Size = 4 | 331,686 | 331,443 | -0.10% | 10.5% | |
| Household Size = 5 | 165,869 | 165,529 | -0.20% | 17.81% | |
| Household Size = 6+ | 117,101 | 116,692 | -0.30% | 43.53% | |
| Household Income: Less than $22,675 | 386,036 | 386,297 | 0.10% | 28.72% | |
| Household Income: $22,675 to $54,588 | 649,459 | 649,668 | 0.00% | 11.29% | |
| Household Income: $54,588 to $109,175 | 718,295 | 717,801 | -0.10% | 8.27% | |
| Household Income: $109,175 or more | 503,462 | 502,713 | -0.10% | 15.88% | |
| Number of workers = 0 | 394,384 | 410,257 | 4.00% | 106.99% | |
| Number of workers = 1 | 951,026 | 959,127 | 0.90% | 29.67% | |
| Number of workers = 2 | 767,074 | 743,881 | -3.00% | 58.94% | |
| Number of workers = 3+ | 144,768 | 143,214 | -1.10% | 51.13% | |
| Major university student living on-campus in dorms | 32,755 | 32,752 | 0.00% | 4.51% | |
| Major university student living off-campus | 99,169 | 98,993 | -0.20% | 52.73% | |
| TAZ Controls | |||||
| Workers with WhiteCollar Occupation | 1,342,691 | 1,387,671 | 3.30% | 16.26% | |
| Workers with Services Occupation | 299,094 | 307,224 | 2.70% | 15.18% | |
| Workers with Health Occupation | 166,712 | 172,683 | 3.60% | 14.89% | |
| Workers with Retail Occupation | 508,199 | 522,880 | 2.90% | 16.38% | |
| Workers with BlueCollar Occupation | 556,820 | 568,955 | 2.20% | 13.62% | |
| Meta Controls | |||||
| Age 0 - 14 years | 1,194,984 | 1,236,839 | 3.50% | 5.70% | |
| Age 15 to 24 years | 780,700 | 802,633 | 2.80% | 5.85% | |
| Age 25 to 34 years | 866,698 | 892,806 | 3.00% | 5.62% | |
| Age 35 to 44 years | 829,438 | 857,093 | 3.30% | 5.82% | |
| Age 45 to 54 years | 834,252 | 861,636 | 3.30% | 5.94% | |
| Age 55 to 64 years | 734,960 | 759,685 | 3.40% | 5.77% | |
| Age 65 to 74 years | 487,607 | 502,656 | 3.10% | 5.21% | |
| Age 75 to 84 years | 214,670 | 220,778 | 2.80% | 4.57% | |
| Age 85+ | 64,047 | 65,775 | 2.70% | 4.57% | |
The results in Table 2-1 indicate that the population synthesizer does reasonably well in matching the controls overall, as can be observed by the low absolute and percentage differences across all controls. Clearly, for all three geographies, the synthesizer is able to match the controls very well as can be inferred from the low RMSE. However, there is more variation at any geography in matching controls that represent a relatively small market (such as household size 6+ or workers = 0). This is simply an artifact of the multiple dimensions that are being constrained coupled with the lack of records in the seed data for these market segments, leading to somewhat more difficulty in accurately sampling the correct number of records that match all of the constraints for specific geographies.
Finally, Figure 2-1 visualizes the mean bias and +/- one standard deviation for the controls – the figure reinforces the results.
Figure 2-1. ARC Controls Validation