CAPI Sampling Strata

This report is released to inform interested parties of research and to encourage discussion. Any views expressed on (statistical,methodological, technical, or operational) issues are those of the author(s) and not necessarily those of the U.S. Census Bureau.”


The American Community Survey (ACS) collects data in three phases: mailout/mailback, Computer Assisted Telephone Interview(CATI), and Computer Assisted Personal Interview (CAPI). During the CAPI phase, a sample of mail/telephone non-respondents andaddresses deemed to be unmailable is selected for a personal visit. The sampling rate for this process is determined by the CAPIsampling stratum for the tract in which the address resides, which is assigned based upon the mail/telephone cooperation rate of thetract. The initial assignment was made prior to 2005. Based upon recent ACS data, cooperation rates were re-calculated for all tracts,which were then assigned to new CAPI sampling strata. This paper discusses the data that led us to revisit the CAPI sampling stratumassignments, and provides the methodology used to calculate the tract level mail/telephone cooperation rates used to assign each tract
to a CAPI sampling stratum.
Keywords: ACS, Sampling, Nonresponse, Personal Interview

  1. Introduction

The ACS is a rolling monthly survey that collects socio-economic and demographic data describing the population and housinginventory in the United States similar to what was historically collected on the census long form. From 2000 to 2005, 1,239 countieswere included in the Census 2000 Supplemental Survey (C2SS), with the addition of Broomfield County, Colorado in 2002. In 2005,the sample was expanded to include all 3,141 counties and county equivalents in the United States. All 78 municipios in Puerto Ricoare included in a separate survey, the Puerto Rico Community Survey, which uses the same basic design as the ACS.

  1. Sample Design

The ACS selects a sample of housing unit addresses from the Master Address File (MAF) twice a year [2]. Main sampling occurs inAugust/September of the year prior to the sample year. Approximately 99 percent of the total annual ACS sample is selected at thistime. In January of the sample year, a sample of addresses that have been added to the MAF since the Main MAF extracts werecreated is selected. This is known as Supplemental sampling and accounts for approximately one percent of the total annual ACSsample [5].
The ACS employs three modes of data collection: mail, CATI, and CAPI. In general, questionnaires not completed during one phaseare sent to the next phase of data collection. The exception to this is for unmailable addresses, which are sampled and sent directly to
the CAPI mode. A sample of the mail/CATI non-responding addresses is also sent to CAPI for follow-up.
Prior to March 2005 all addresses that did not respond to the mailout questionnaire and that were not reached during the CATI phaseof data collection were sampled for CAPI at a flat 1-in-3 rate. In order to produce estimates of comparable reliability for all areas, wedeveloped methodology to differentially sample non-responding addresses for CAPI at the tract level by increasing the CAPI samplingrates in tracts with low combined mail/CATI cooperation rates. To offset the additional cost associated with increasing the number ofCAPI cases, the differential CAPI sampling plan was designed to be cost neutral. This was accomplished by reducing the initiallyselected sample in blocks with the two lowest sampling rates in tracts with the highest expected mail/CATI cooperation rates. Thesample reduction in these areas began with the Main 2005 sample selection in August-September of 2004 and has been reflected in the
mailout beginning with the January 2005 panel. Note that the CAPI sampling rate for all unmailable addresses was not changed andremains at 2-in-3 [4].

  1. Research Question

Our research was designed to answer the following questions:
Does the allocation of tracts to differential CAPI sampling strata need to be revised? If so, how will this be accomplished?
The answer to the overarching research question rested on several secondary questions:
1. Does the cooperation rate for each individual stratum fall within the predicted range for that stratum at various levels ofgeography? This will enable us to assess how well the current design is working overall.
2. What is the distribution of the tract level cooperation rates by assigned CAPI sampling strata? This will assess how well thecurrent design is working for each tract.
3. How should the vacancy rate component of the cooperation rate be calculated? The vacancy rate has a significant impact on thecooperation rate calculation. Any change to the vacancy rate calculation will have an impact on the design.
4. What impact is there on the CAPI workload from a new allocation of tracts to the CAPI sampling strata? Any new allocation oftracts into new CAPI strata will impact the CAPI workloads and could necessitate a change to the CAPI reduction factor as theredesign was also constrained to be cost neutral.
In order to address these questions, indicators related to cooperation and return rates, expected sample size, and cost estimates forsmall levels of geography were created. We classified each tract into a CAPI sampling stratum. The cooperation rates, which remove
an estimate of the number of vacant and deleted units from the denominator, were used to group tracts by level of cooperation. Thereturn rates from the mail, CATI, and CAPI phases of data collection were used in conjunction with the estimated sample sizes andcost estimates to determine a new value of the reduction factor.

  1. Previous CAPI Sample Design

4.1 Combined Mail/CATI Cooperation Rate Calculation
We excluded vacant housing units from the denominator of the cooperation rate so that a high vacancy rate alone will not cause a tractto be sampled at a higher rate. The final estimate of the cooperation rate was defined as the weighted total of mail and CATIinterviews divided by the estimated number of occupied mailable addresses. We calculated the cooperation rates using C2SS datafrom 2000 through 2003. Thus, they could only be calculated for about 50,000 tracts in the nation. For the approximately 15,000tracts with no ACS data, cooperation rates were modeled using a simple linear model from the Census 2000 long-form cooperation rates.
4.2 CAPI Stratum Design
The initial research analyzed several design options. There were two important factors considered: 1) What cooperation rate cut-offswere appropriate to use in assigning the CAPI sampling strata and 2) What reduction factor should be used in the high cooperationrate strata to keep the sample design cost neutral. Table 1 shows the cutoff values in the final design.

These cut-offs values were chosen for several reasons. They help increase the overall interview rate for groups with typically lowresponse patterns. This increase in the number of interviews in turn reduces the variance of the published estimates. In addition, thesecut-offs provided a good balance between the number of tracts where the sampling rate would increase and maintaining a reasonablereduction factor. The CAPI sample was increased in approximately 11,000 tracts while the original sample was reduced by 8% inroughly 42,000 tracts. These are the tracts that are in CAPI sampling stratum 4 and contain blocks with high population density.
Tracts in CAPI sampling stratum 4 but in areas of low population density do not undergo sample reduction. Thus, a significantnumber of areas were impacted by increasing the sampling rate while spreading out the compensating sample reduction.
5. Decisions, Assumptions, and Limitations
5.1 Decisions and Assumptions
The following questions needed to be answered prior to completing the research.
1. Should an estimate of deleted units be removed from the denominator of the cooperation rate?
In the initial research, the cooperation rate was calculated by taking the total weighted number of mail and CATI interviews togetherdivided by the estimated number of occupied mailable addresses. The number of occupied mailable addresses was estimated bySection on Survey Research Methods – JSM 20083515subtracting the estimated number of vacant mailable addresses from an estimate of the number of mailable addresses. In this way, ahigh vacancy rate would not cause an artificially low cooperation rate. The same logic can be applied to deleted units as these shouldnot “hurt” a tract’s cooperation rate. We decided to remove an estimate of the deleted units from the denominator of the cooperationrate.
2. Should the CAPI completion rate be taken into account since the overall goal is to produce estimates of comparable reliabilityacross all areas?
The methodology used in the assignment of the differential CAPI sampling strata was based entirely upon the mail/CATI cooperationrate. If there was a large change for certain types of areas in the CAPI completion rate, then it might be possible to use this rate tobetter determine the CAPI stratum assignment. A potential problem with using the CAPI completion rate is that there are only about45,000 CAPI cases a month, drastically increasing the variance of any estimates as compared with the mail/CATI cooperation rate(230,000 cases a month). In addition, after calculating the CAPI completion rates, more than 90% of tracts had a completion rategreater than 70%, making it difficult to delineate cut-offs. For these reasons, the CAPI completion rate was not used to redesign theCAPI sampling.
3. Will there be a major change to the global design of the CAPI sample?
Our goal was not to change the basic sample design itself but to assess the effectiveness of the current design and to update thedesignation of the sampling strata by taking advantage of ACS interview data. The number of CAPI sampling strata as well as thecooperation rate ranges for those strata was not changed. While the design did not change, after the new sampling strata designationswere determined, a new reduction factor was calculated to maintain the cost-neutral design.
5.2 Limitations
We were limited in our research by the following:
1. Ideally, weighted observed cooperation rates would be calculated at the tract level using data cumulated over several years of fullimplementation data collection. However, the differential CAPI sampling rates have only been in use since the selection of the March
2005 CAPI sample corresponding to the January 2005 panel. Therefore the number of sample cases that we were able to include inthe analysis was less than optimal.
2. Sample cases in Remote Alaska are not included in this analysis. They were excluded from the original CAPI research alsobecause these cases are unmailable, and therefore contribute zero to the cooperation rate. They are sampled for CAPI at a 2-in-3 rate.
3. Only geocoded records (records for which we know which block it is in) can be included in the cooperation rate calculations. Allungeocoded records are placed into a CAPI sampling stratum, designated as stratum ‘5’-out of scope, during the initial sampleselection and are sampled for CAPI at a rate of 1-in-3 with no reduction. Also, tracts with an unmailable rate greater than 25% areplaced into stratum 5 since unmailable addresses are sampled for CAPI at a 2-in-3 rate. Thus, the cooperation rates for CAPI stratum5 include those tracts where there was either insufficient data available at the time of the initial CAPI strata research.
6. Methodology Used
6.1 Design Area
The design area for the differential CAPI sampling is the Census Tract. The allocation of tracts to the sampling strata produces a filewith one record for each of the 65,443 census tracts and one record representing ungeocoded records. This differs from the currentmethodology in that the initial CAPI research was done using the 2003 current geography and the subsequent allocation of differentialCAPI sampling strata was done using current state (based on sample year)/current county (based on sample year)/tract. Due toconstantly shifting boundaries, some values of this combination are not on the file used to assign the sampling stratum. By using
Census Tract, all addresses will match and will be assigned to a sampling stratum instead of defaulting to CAPI stratum 5 and beingsampled at the default rate of 1-in-3.
6.2 Cooperation Rate Calculation
1. Cooperation Rate
We chose a measure of mail/CATI cooperation that does excludes vacants and deletes from the denominator. We should, therefore,be able to better target areas where households are less likely to respond by mail or complete a telephone interview. Response data tocalculate the cooperation rates were drawn from the February 2005 through December 2006 monthly panels. January 2005 will not beincluded due to CAPI workload reductions in some counties while ramping up to full ACS implementation levels.
2. The following components were calculated within each state/county/tract combination:
Section on Survey Research Methods – JSM 20083516
• Valid, geocoded, and mailable sample addresses, (MAIL): This number is tallied from the January 2005 and January 2006 MAFextracts.
• Valid, geocoded, and mailable vacant sample addresses, (VAC): This number is estimated by applying the vacancy rate to MAIL
• Valid, geocoded, and mailable deleted sample addresses, (DEL): This number is estimated by applying the deletion rate to MAIL
• Weighted number of valid, geocoded, mailable interviews from the mail and CATI modes of data collection, (INTS): Thiscomponent is calculated by summing the unbiased sampling weights for all mail and CATI interviews.
These components were summed to the Census Tract level within CAPI stratum and the weighted cooperation rate (expressed as apercentage) was calculated:

6.3 Maintaining Costs
1. Reduction Factor
After allocating tracts to the CAPI strata based on the new cooperation rates, the value of the reduction factor was adjusted for a costneutral design. The reduction factor is used to decrease the initial sample size in the highest cooperation rate tracts (CAPI samplingstratum ‘4’) to account for the differential sampling in the low responding tracts. The sample is only reduced in blocks that aredensely populated and that are in the two lowest initial sampling strata.
2. Costs
Costs were maintained based upon 2006 response data. The unit costs we used for each mode of data collection are:
MAILC = $12
CATIC = $15
CAPIC = $135
Note that if a sampled address is in the MAIL, CATI, and CAPI phases, then the total cost for that address is cumulative ($162).
3. The following components were calculated at the national level. Group 1 contains all cases where sample reduction does not occurand group 2 contains all cases where sample reduction does occur. Neither group includes Remote Alaska.
• Total number of mailable addresses in 2006 (MAIL)
• Total number of addresses that went to CATI in 2006 (CATI)
• Total number of addresses that went to CAPI in 2006 excluding Remote Alaska (CAPI)
• The estimated number of mail cases in group ‘k’ (EXMk)
• The estimated number of CATI cases in group ‘k’ (EXCTk)
• The estimated number of CAPI cases in group ‘k’ with the new value of CSTRM (EXCPk)
4. The components listed in 6.3.2 AND 6.3.3 were input into the following cost formula:

This was solved for CFACTOR, which represents the reduction factor and led to the following equation:

. Results and Analysis
7.1 Behavior of the Initial CAPI Stratum Designation
We investigated the existing CAPI sampling stratum designation to see if the tract allocation was supported based on the currentcooperation rates. We used the same vacancy rate for each tract that was used in the original research. The new CAPI stratumallocation is compared to the old allocation in the following tables.

Table 2 shows the distribution of valid addresses from the January 2007 MAF extract by the original allocation and new allocation ofthe CAPI sampling stratum. More than half of all addresses that were in stratum 1 are now in strata with higher cooperation rates.
Both strata 2 and 3 show movement of about 45% of their addresses into higher responding strata, and almost 80% of the addresses instratum 4 remain in stratum 4. Even though a smaller percentage of addresses are moving out of stratum 4 than the other strata, the
larger number of addresses initially in stratum 4 means that the total number of addresses moving isn’t that different.
Table 2 also shows the net movement of addresses by showing the total percentage of valid addresses within each stratum for both thecurrent and new allocations. Both strata 1 and 4, the strata with the most extreme rates, lost addresses while the more moderateresponding strata, 2 and 3, as well as strata 5, saw a slight increase in their percentages. The last row of Table 2 shows the totalpercentage of addresses that did not change CAPI stratum. This number is only 69%, meaning that about 31% of addresses would fallinto a different CAPI stratum using observed cooperation rates.
Table 3 shows the same distributions as Table 2 for the movement of tracts, not valid addresses. This table shows that a largerpercentage of tracts are moving than valid addresses, meaning that the tracts that are moving are smaller tracts with less addresses thanthe tracts remaining in the same CAPI strata. Stratum 1 even shows a slight increase in total percentage instead of the slight decreaseseen in Table 2. This indicates that the valid addresses seen here are in small tracts.
In addition to movement of tracts across CAPI strata, we examined the cooperation rate within current strata at several levels ofgeography. Figure 1 shows the national level cooperation rate within each stratum. This graph shows the expected distribution withthe lowest cooperation in stratum 1 and the cooperation rate increasing for each subsequent stratum. The cooperation rate for stratum5 is shown for completeness, though tracts in this stratum are out of scope.

Figures 2 and 3 break the nation into two groups: C2SS counties and expansion counties. These were especially important to look atsince only the C2SS counties had data at the time of the original research. The expansion counties had no ACS data; their cooperationrates were modeled from Census 2000 Long Form data. The distribution looks good for the C2SS counties, but the expansioncounties show an unexpected distribution. Instead of a steady increase in cooperation rates across all strata, the cooperation rates arerelatively flat. This was a concern because it indicates that the current allocation of CAPI strata is over sampling in areas of high
response, and that the design could be more efficient.

From Tables 1 and 2, we see that there is quite a bit of change in the distribution of tracts and addresses across the CAPI strata. Inaddition, in the 1,901 expansion counties, the current CAPI stratum designation is inconsistent with their current cooperation rate.
Both of these facts indicate that the allocation of the differential CAPI sampling strata needs to be revised to better reflect the actualACS cooperation across all tracts.
7.2 Different Research Designs
Since there is now almost two years of ACS data for all counties that use the differential CAPI sampling design, it was possible to usethat data to calculate the tract level vacancy rates instead of Census 2000 data. The ACS data is more relevant and more likely toreflect the current state of vacancy across the nation, but since it is survey data it has associated variance.
To see just how much change there would be, the correlation between the two vacancy rates was calculated. With a minimum ACSsample of 1 to calculate a vacancy rate, ρ = 0.7. With a minimum sample of 10, ρ = 0.72, while with a minimum sample of 50, ρ =0.77. This shows that there is a strong positive correlation, but not so large that no information would be gained from using ACS data.
One way to combat the problem of variance in the ACS vacancy rate estimate is to create a composite vacancy rate by combining itwith the Census 2000 vacancy rate. This has the effect of decreasing the variance by 75% while still using the information gainedfrom the ACS. This was done in the following manner:
Composite Vacancy Rate = .0 5 × (Census VR + ACS VR)
Therefore the variance is:

Another factor we considered was whether special treatment outside of the two previous options might be warranted. After reviewingthe available information, we decided that using ACS data in the composite vacancy rate would provide a vacancy rate that involvedrecent information and have an appropriate impact in all areas. In addition, this maintains a consistent approach across the nation.
7.3 Impact on Valid Address and Tract Distributions.
The valid address and tract distributions for the Census Vacancy option can be seen in Tables 2 and 3. The valid address distributionfor the Composite Vacancy option can be seen in Table 4. It shows a similar distribution as the Census Vacancy option with almostthe same percent of addresses seeing no change. But there is a slight trend toward having fewer addresses in the lower respondingstrata. Both strata 1 and 2 have about half a percentage point less, and there is slightly more than one percent more addresses instratum 4 than the Census Vacancy option. This is because the vacancy rates from ACS tend to be slightly higher than those fromCensus 2000 resulting in slightly higher cooperation rates.

Table 5 shows the tract distributions for the Composite Vacancy option. Overall it shows the same general distribution as the CensusVacancy option in Table 3, but as shown in the valid address distribution, there is a slight change in the total percentage distributionwith about 1% more tracts in stratum 4 under the Composite Vacancy option than in the Census Vacancy option.
Even though the overall distributions are close, that doesn’t necessarily imply that individual tracts are being allocated to the sameCAPI stratum under all designs. For example, Table 6 shows the number of tracts that changed CAPI stratum between the CensusVacancy and Composite Vacancy designs. From the table, 518 tracts moved out of stratum 1, while only 118 moved into stratum 1.
So there was a net loss of 400 tracts from stratum 1 by changing from Census Vacancy to the Composite Vacancy option. Similarly,strata 2 and 3 also had a net loss of tracts while stratum 4 gained a total of 823 tracts. This is what we would expect from the previoustables. So while most of the country will not differ between designs, the total number of tracts affected, 4,428, is a significant portionof all tracts at 7%.

7.4 Impact on CAPI Workloads
Table 7 shows the national CAPI workload estimates based on 2006 data. It also shows the corresponding reduction factor for eachdesign to maintain costs across the survey. The current reduction factor is 0.92, so there is not much change in any of the designs.
This is because even though there is significant change in where specific valid addresses are located, the overall distribution of thenation as a whole did not change a lot for any of the designs.

The Census Vacancy option has the larger increase in the workload of almost 1,900 over the course of the year. That’s about 160cases a month for the whole nation. The Composite Vacancy option had increase in the workload less than half that at about 800 casesa year, or a little more than 65 a month. The differences between the designs will be localized to the 4,458 tracts from the previoustable, so while there isn’t much difference nationally, there could be an impact at smaller levels of geography.
8. Conclusion
There were some problems evident in the allocation of the initial CAPI stratum designation. In the 1,900 expansion counties, thecooperation rates for the CAPI sampling stratum did not correspond to the expected levels. This was indicative of inefficiencies in theinitial CAPI sampling design and is not totally unexpected given the model used.
Both of the designs achieved a similar valid address and tract distribution. The only difference was the slight increase in thepercentage of the universe in stratum 4. But even though the overall distributions were similar, there was a moderately sizeddifference in which tracts were being allocated to which CAPI sampling strata showing that the ACS data used in the Composite
Vacancy option did impact the results.
The movement in tracts did exhibit some noticeable changes in the CAPI workloads and the calculation of reduction factor. TheCensus Vacancy option had a reduction factor around 0.91 while the Composite Vacancy option had a reduction factor around 0.92,which is the current reduction factor. This larger reduction factor also corresponds to a smaller increase in the CAPI workload.
The larger reduction factor for the Composite Vacancy option is also a symptom of having more addresses in stratum 4. This spreadsout the sample reduction across more addresses, which reduces its impact on any specific area of the country. The CompositeVacancy reduction factor was the same as the initial reduction factor maintaining consistency across the years.
The Composite Vacancy option takes advantage of the timeliness of the ACS data while lowering the variance with the Census 2000vacancy rate. The final decision was to use the Composite Vacancy option to reallocate the differential CAPI sampling strata. TheseCAPI sampling stratum allocations were used for the first time when selecting the 2008 housing unit ACS sample [3].

We wish to thank the following people who us helped by assisting in analyzing the data contained in this paper, or helped byproviding many clear and useful comments and suggestions: Mark Asiala, Karen E. King, Alfredo Navarro.
[1] Asiala, M. (2005). “American Community Survey Research Report: Differential Sub-Sampling in the Computer Assisted PersonalInterview Sample Selection in Areas of Low Cooperation Rates”. 2005 ACS Documentation Series ACS-DOC-2.
[2] Bates, Lawrence. (2006). “Editing the MAF Extracts and Creating the Unit Frame Universe for the American CommunitySurvey”.2007 ACS Universe Creation Series ACS07-UC-2.
[3] Castro, Edward. (2007). “American Community Survey: Review of the Computer Assisted Personal Interview Sample Design”.
2007 ACS Research Series ACS07-R-9.
[4] Hefter, Steven. (2005). “American Community Survey: Specifications for Selecting the Computer Assisted Personal InterviewSamples”. 2005 ACS Sampling Series ACS-S-45.
[5] Hefter, Steven. (2006). “Specifications for Selecting the Main and Supplemental Housing Unit Address Samples for the AmericanCommunity Survey”.2007 ACS Sampling Series ACS07-S-3. 



Castro in his research work ‘Redesigning the American Community Survey Computer Assisted Personal Interview Sample’ has analysed data collected by the American Community Survey (ACS). The American Community Survey is responsible for gathering information and data related to socio-economic and demographic parameters in United States. This data is required for elucidating two sectors  population and housing inventory.

Data collection is done in three main stages namely mailout/mailback, Computer Assisted Telephone Interview (CATI), and Computer Assisted Personal Interview (CAPI). Random sampling of houses from  Master Address File (MAF) is done two times in one year. This is carried out during the months of August or September. Surveys through questionnaires is not possible to be completed in a year and hence sometimes can be carried forward to next year.  Sometimes some addresses are unreachable and such addresses are forwarded to Computer Assisted Personal Interview.

Also with Computer Assisted Personal Interview, personal visit is done for cases in which address is not reachable or respondent is unmailable. Initially in 2005, addresses which did not respond to surveys mailed were sent to Computer Assisted Personal Interview. Also respondents not reached through Computer Assisted Telephone Interview  were sent to Computer Assisted Personal Interview which used flat one in three rate.

A new technique was developed in this study ensure that the estimates generated through the data was comparatively authentic. For this differential sampling was conducted for unresponsive addresses through Computer Assisted Personal Interview by improving the sampling rates. However this could lead to increased cost, the differential Computer Assisted Personal Interview technique was build to be cost neutral.

The sampling rate for Computer Assisted Personal Interview method is calculated through sampling stratum for the tract. In this, address is allocated by using  mail/telephone cooperation rate of the tract. On the basis of new data generated by American Community Survey, re-evaluation of cooperation dates were done and then allotted to new  Computer Assisted Personal Interview sampling stratum. In this research paper, a method is devised to evaluate  tract level mail/telephone cooperation rates. These rates are used by Computer Assisted Personal Interview sampling stratum for getting allotted tracts.


The main objective of this research is to determine whether it is necessary to alter the method for  allocation of tracts to differential Computer Assisted Personal Interview sampling strata. Also, if the alteration is necessary, how should it be done. For this, it is necessary to address various other questions. To analyse how the existing method is performing, it is necessary to check whether the cooperation rate for each individual single stratum lies between the estimated range for different regions.

To evaluate how the existing method is operating for each tract, it is required to determine the distribution of the tract level cooperation rates allotted by Computer Assisted Personal Interview sampling stratum. Another important factor to consider is the vacancy rate. It affects the evaluation of cooperation rate. Thus alteration in vacancy rate will affect the method. Thus it is necessary to determine how to carry evaluation of vacancy rate. Another factor which affects Computer Assisted Personal Interview is allotment of new tracts and therefore the impact should be estimated.

For conducting the above mentioned activities, indicators like  expected sample size, and cost predictions for small levels of geography were developed. Tracts were categorized by level of cooperation through use of cooperation rates. In initial calculations, cooperation rate was calculated as the weighted total of mail and Computer Assisted Telephone Interviews divided by the estimated number of occupied mailable addresses. Modelling of cooperation rates was done through conventional linear model.

For designing the Computer Assisted Personal Interview, two main parameters were considered. These were appropriate cooperation rate cut-offs and reduction factor. The cutoffs were chosen in such a way that they helped improve the overall interview rate for categories which has less responses. Thus higher number of interviews resulted in lowering of  variance of the published estimates. Also, these cutoffs stabilized the sampling rate and provided a good reduction factor.


Initially, amount of occupied mailable addresses was calculated by subtracting the predicted amount of vacant mailable addresses from predicted amount of mailable addresses. This meant a high vacancy rate would not result in a lesser cooperation rate. Using the same technique, assumption is made to discard  prediction of deleted units from the denominator while evaluating cooperation rate.

For allocation of  differential Computer Assisted Personal Interview sampling strata, only mail/Computer Assisted Telephone Interviews cooperation rate were used. However a huge issue associated with the use of Computer Assisted Telephone Interviews is that the number of cases is as large as 45000 per month. This results in high value of variance of estimates. Also, after evaluation of Computer Assisted Personal Interview completion rate, it was found that percentage of  tracts with a  completion rate higher than 70% were almost about 90%. This further complicated description of cut offs. Hence it was assumed that Computer Assisted Personal Interview did not include Computer Assisted Personal Interview completion rate.

The objective of the research was to improve sampling strata with the assistance of American Community Survey interview data. Since differential Computer Assisted Personal Interview  sampling rates were used only after 2005, the assessment includes number of sample cases lower than optimal. Also,  Remote Alaska  region was not considered as a part of this analysis. Thus this is a limitation of this method.


The design area used for this method is the Census Tract. The number of census tracts were 65,443 and each had one record of tracts allotted to sampling strata. Census tract is used since it matches all addresses and they will be correctly allocated to sampling strata. To calculate co operation rate, metric with mail/CATI cooperation without vacants is used. This helps to gather better targets where households usually dont reply by mail or phone interviews. Data for calculation of cooperation rate was taken monthly from February 2005 to December 2006.

Reduction factor was altered according to so that the cost could be made neutral. It is used to reduce the initial sample size in the higher cooperation rate tracts so that the differential sampling in the lower tracts could be adjusted. The costs for data were taken from 2006 data.


The study was done to check whether the tract allocation was successful for the current cooperation rates. Comparison is done between the new Computer Assisted Personal Interview stratum allocation and the initial allocation. It was found that smaller percentage of addresses were shifting from stratum 4 than the other strata. Since stratum 4 already had huge amount  of addresses, it suggests that the total amount of addresses shifting is not much distinct.

Other results found were that more percentage of tracts were shifting as compared to valid addresses. This can be interpreted as tracts which are shifting are smaller ones having less addresses as compared to the ones still in Computer Assisted Personal Interview strata. One observation about co operation rate was that it increased for every subsequent strata. It is also noted that there is considerable change towards the distribution of tracts and addresses within the Computer Assisted Personal Interview strata.

Census vacancy and composite vacancy have similar tract distribution. However this does not mean that single tracts allotted to the Computer Assisted Personal Interview  strata are also same. The reduction factor evaluated is 0.92 which is similar to old testa and thus suggests that there are not much differences in designs. Census vacancy has observed about 1900 cases of workload through the entire year. On the other hand, composite vacancy has observed almost half i.e 800 cases through the course of the entire year.


Few issues were prevalent in the allotment of the initial Computer Assisted Personal Interview stratum. In the counties experimented which were 1900 in number, cooperation rate was found to be different from expected values. From this it can be concluded that there was presence of inconsistencies in the initial sampling model and this was expected.

Comparable valid address and tract distribution were observed for both the types of designs. One distinction was observed for the percentage of the universe in stratum 4. However, substantial distinction was seen in tracts allocated to Computer Assisted Personal Interview stratum which was indicative that American Community Survey information did alter the results.

Shifting in tracts resulted in alterations in Computer Assisted Personal Interview workloads and evaluation of reduction factor. A reduction factor of 0.91 was observed for the census Vacancy option and about 0.92 for composite vacancy. This 0.92 value also corresponds to the current reduction factor. Higher value of reduction factor means that lesser increase in Computer Assisted Personal Interview  workload.

Composition vacancy has a higher reduction factor because of more number of addresses in stratum 4. This means impact of reduction factor on more addresses gets diluted thus resulting in lesser effect on any region. It makes use of the fact that American Community Survey data is timely. Thus it reduces the error associated with  Census 2000 vacancy rate. Since the value of composition vacancy was same as  initial reduction factor, it means stability is observed throughout the years. Hence it is suggested that composite vacancy for assigning sampling strata.


This research study contributes by re modelling the sample selection for Computer Assisted Personal Interview used for data collection. The data analysed is gathered by the American Community Survey. Comparison is done between before 2005 data and recent data collected. A method is suggested for evaluating cooperation rates and these rates are then allocated to the new Computer Assisted Personal Interview sample strata. The results are evident of the fact that the older model used has many irregularities. It is also found that composite vacancy can be a better estimate for allocating sampling strata. While choosing the 2008 housing unit American Community Survey sample, these allocations generated were used.


  1. Castro Jr, Edward C., and Steven P. Hefter. “Redesigning the American Community Survey computer assisted personal interview sample.” Proceedings of the Survey Research Methods Section, American Statistical Association. 2008.