CHFS&CMES

Survey and Research Center for China Household Finance is a non-profit academic institute of survey and research established by Research Institute of Economics and Management at Southwestern University of Finance and Economics 2010. Now it has been developed into an internationally renowned academic survey institute with comprehensive micro data, including three databases on China household, small-and-micro enterprises and community governance.

CHFS is the Center’s first national survey, aimed at collecting micro-level information about household income, expenses, assets, liabilities, insurance and securities, etc.collects micro-level information about household financial and physical assets (housing and other property), debts and credit constraints, income, expenditures, social insurance, intergenerational transfer payments, demographics, employment, and payment history. This data is compiled and analyzed in order to offer high-quality micro-data on Chinese household finances which can be used for academic study and policy making. 

In July-August, 2011, the Center sent over 600 SWUFE students to interview 8438 effective household samples in 320 communities of 80 counties in 25 provinces.

In June-September, 2013, the Center sent over 1600 SWUFE students to interview 28,000 effective household samples in 1048 communities of 260 counties in 29 provinces. The data is nationally and provincially representative.

 In April-September, 2015, the Center sent over 2500 SWUFE students to interview 40,000 effective household samples in 1439 communities of 363 counties in 29 provinces. The data is nationally, provincially and sub-provincially representativeness.

The use of advanced technology, such as Computer Assisted Personal Interviewing (CAPI) and Computer Assisted Telephone Interviewing (CATI), helped to improve the response rate and to minimize the refusal rate.

We published the survey data and findings from our first round on May 13, 2012. The results were well-received by academics and the general public. Additionally, we published  two English books, Research Report of China Household Finance Survey 2012, and Report on the Development of Household Finance in Rural China (2014) that contain the results and analysis of the survey and is available to the public.


CHFS Sampling Design 

      The sampling design for the China Household Finance Survey (CHFS) consists of two major components, an overall sampling scheme and an onsite sampling scheme based on mapping. This design meets two goals. On one hand, it is to draw a random sample that is representative of all Chinese households. On the other hand, it aims to provide sufficient data for answering important research questions such as household assets allocation, consumption, and saving, to name a few. To achieve these goals, the sampling design has the following four features. First, we oversample observations from relatively wealthy regions. Second, we oversample observations from urban areas. Third, the sample is representative of the diverse geographic regions of China. Fourth, other things being equal, we choose the procedures that cost least.
     
1. The overall sampling scheme
This project employs a stratified three-stage probability proportion to size (PPS) random sample design. The primary sampling units (PSU) include 2,585 counties (including county level cities and districts) from all provinces (including provincial cities) in China except Tibet, Xinjiang, Inner Mongolia, Hong Kong, Macau, and Taiwan. The second stage of sampling involves selecting residential committees/villages from the counties/cities selected at the earlier stage. The last stage is to select households from the residential committees/villages chosen at the previous stage. Every stage of samplings is carried with PPS method and weighted by its population size. In result, we set the sample size at somewhere between 8,000 and 8,500 households.
 In practice, we selected about 80 counties from the PSU, and then 4 residential communities from each of the 80 counties, and then 20-50 households from each of the selected residential communities depending on the level of urbanization and economic development. The average number of households from each residential community is 25. This produces a sample size of 8,000 (4x25x80=8,000).
 (1) The first-stage sampling
The first-stage sampling is to select 80 counties out of the 2,585 PSUs. Ideally, the 80 counties should not only cover diverse geographic regions but also contain enough observations from relatively wealthy areas in China. To achieve this outcome, we sort the 2,585 counties into 10 strata based on their GDP per capita. In each stratum, 8 counties are randomly drawn with PPS where each county is weighted by its population size. By this way we got 80 counties covering 25 provinces in China. Table 1 compares some descriptive statistics of GDP per capita of the selected 80 counties with that from the national statistics. They are very close to each other according to the illustration of table 1.

To examine the geographic distribution of the selected counties based on the abovementioned

sampling scheme, we repeated the PPS sampling procedure by random simulation for 1,000 times and compared the average with the national statistics. The small standard deviations shown in the Table 2 suggests that current sampling scheme has produced consistent geographic distributions of the selected counties across trials. On average, the ratio for selected counties in the Eastern, Central, and Western China is about 37: 30: 33. Comparing it to the national statistics, the proportion of counties from the Eastern China is a little bit higher. However, this does not pose any serious problem in that our priority is to have a geographically balanced distribution of counties/cities from all over China. In the final sample of 80 counties/cities from 25 provinces, the ratio for selected counties in the Eastern, Central, and Western China is 32: 27: 21.

      (2) The second-stage sampling
      At this stage, we select residential communities from the counties. The key is to decide the ratio of urban residential committees over rural villages. If the sample is drawn based on the household registration, it would produce a sample with fewer observations from the urban areas. Given one of the key purposes of the survey is to study the household assets where urban residence are likely to have more assets, we oversample the urban population by the following procedures.
      First, we sort the counties according to the proportion of non-agricultural population and divide them into five groups, i.e., quintiles.
      Second, for counties in the top quintile with the highest level of non-agricultural population, the ratio of sampled residential communities from the urban areas over sampled villages from the rural areas is 4: 0.
      Third, for counties in the quintile below the top one, the ratio of sampled residential communities from the urban areas over sampled villages from the rural areas is 3: 1.
      Fourth, accordingly, for counties in the bottom quintile with the lowest level of non-agricultural population, the ratio of sampled residential communities from the urban areas over sampled villages from the rural areas is 0: 4.
      Following the above scheme, we got two separate sampling frames, an urban one and a rural one. Given the numbers of residential communities or villages we are supposed to draw from each sampling frame, we then conducted PPS sampling according to the number of households in each residential community. Table 3 illustrates the distribution of urban residential communities in the 80 counties.
    

      (3) The third-stage sampling
      The last stage of sampling in CHFS is to select households from the chosen residential communities. In each rural village, we randomly draw 20 households; whereas in the urban areas, the number of households that we select varies according to the housing price of the residential communities. Based on the average housing price of each neighborhood, we sort the residential communities and divide them into quartiles. For the top quartile where the average housing price is the highest, we draw 50 households from each residential community; for the bottom quartile where the housing price is the lowest, we select only 25 households. Thus we are able to have a greater number of wealthy households in the sample. See Table 4 for the distribution of the number of households across urban residential communities.
    
       2. The onsite sampling scheme

      (1) Mapping residential areas
      The onsite sampling is based on the mapping of the residential areas and the collection of household lists in the area. The extent to which the map
isdrawn precisely directly affects the quality of this last stage of sampling.

      The CHFS develops a geographic information sampling system using the technologies of remote sensing, GPS, and GIS to collect geographic information of the targeted areas. The fine-grained digital imagery and vector maps used in the mapping come from the Institute of Geographic Information of the Chinese Academy of Sciences. When on the field, our trained mapping technicians use an electrical measuring instrument and a GPS system to collect accurate electronic data, which are automatically transferred to computers to create high-quality vector maps. We also take into account the potential changes of the geographic data after we collect the data in the first place and manually check and record any change at later stages. In this way, we make sure the geographic information in the virtual world in our system matches that in the real world.
      The system we developed not only allows our mapping technicians to draw residential household location directly on the electronic map but also stores relevant household location information used for the last stage sampling. This innovation improves efficiency, decreases potential errors in the mapping and sampling process, and helps to protect the privacy of household information. Our working procedure is illustrated in Figure 1 below.

     (2) Selecting households

      We use the equal-space sampling procedure to draw households from the household list collected from the previous mapping stage.
      First, we calculate the sampling interval, i.e., out of how many households one is chosen, using the following formula:
      Sampling interval = total number of households in the community / number of households to be selected (round up to the closest integer)
 E.g., If we plan to draw 30 households from the 100 households of the residential committee/village, we get 100/30=3.33. Then the sampling interval should be 4.
Second, the random starting point is decided by the unit digit of the clock time when the procedure is carried out. For example, if the clock time is 15:34, then 4 is the starting point; if it is 12:03, then 3 is the one.

Third, we draw the households. The first selected household is the one that the random starting point corresponds to on the household list. Using above example again, if 3 is the starting point and 4 is the sampling interval, the 3rd household on the list is then the first chosen one in the sample, so are the 7th, 11th, 15th, 19th …until all 30 households are drawn from the list. 

Sample size 

The 2011 CHFS collect information from 8,438 households consisting of 29,463 individuals. The weighted average household size is 2.94 individuals, specifically 2.67 for urban households and 3.18 for rural households.In practice, we selected about 80 counties from the PSU (2,585 counties) , and then 4 residential communities from each of the 80 counties, and then 20-50 households from each of the selected residential communities depending on the level of urbanization and economic development. The average number of households from each residential community is 25. This produces a sample size of around 8,000 (4x25x80=8,000).