在线办公区
作者在线投稿 专家在线审稿 专家在线办公
HOME / ARTICEL
Comparative Analysis of the Sample Structure of Migrants in Large-Scale Social Survey Data in China

Zhou Hao;Lei Linxuan |
Year.Issue:Page: 2025.3:114-128 | Chinese Library Classification Number:
Keywords:
Sample Survey Floating Population Measurement Sample Structure
ABSTRACT

Data forms the foundation of social science research. This paper compares definitions and data collection methods of "floating population" across six large-scale social surveys in China, highlighting structural differences between survey samples and the national floating population. Using a consistent model, it demonstrates how sample structure impacts analytical outcomes. Key findings include: (1) Significant definitional differences across surveys yield varied study population; (2) Estimated floating population proportions differ by survey, with both inter-provincial and intra-county migration rates generally lower than 2015 census data; (3) Floating population samples vary widely in gender, age, education, and urban-rural distribution; and (4) Basic demographic characteristics differ significantly in both significance and direction, revealing common social patterns and sample-specific biases. The study underscores the importance of measurement in defining research population, variable measurement, and statistical outcomes. It recommends using census or 2015 census data for estimating total floating population, emphasizing common patterns across samples to reflect social realities, and standardizing floating population metrics across surveys.

BACKGROUND

Survey data serve as a critical empirical foundation in social science research, where measurement accuracy and sample representativeness directly affect the reliability of empirical knowledge and the validity of causal inference. In studies concerning floating population, survey data involve not only the definition and identification but also the sample representation of the “floating population”. Structural biases in these samples may distort understandings of the trends of the floating population — such as direction, scale, and distribution — and further compromise research conclusions and policy implications. Although existing literature has examined methodological issues in floating population surveys, such as sampling design, sampling frame construction, and site selection, few studies have systematically compared the actual results of different large-scale surveys in terms of the structure and representativeness of their floating population samples. This lack of comparative analysis raises serious concerns regarding the validity of research based on these data.

OBJECTIVE

We aim to systematically compare the definitions and sample structures of “floating populations” across six major large-scale social surveys in China, so as to investigate the extent to which these sample structures differ from the national floating population, and examine how such structural differences affect statistical results.

METHODS

This study draws on six of the most representative and widely used nationally representative large-scale social surveys in China: CFPS, CGSS, CHFS, CHNS, CSS, and CLDS. To ensure better comparability across datasets, several criteria were used in data selection and processing:

(1) The 2015 1% National Population Sample Survey is used as a reference benchmark, given the variation in survey years,

(2) For each survey, data from the 2015 wave or the nearest available wave were selected. Specifically, the 2014 wave of CLDS and the 2016 wave of CFPS were used, due to the panel survey design of CLDS and CFPS.

(3) All analyses in this study are based on weighted data to better reflect population-level representation.

The analysis proceeds in three steps: first, it compares the definitions of “floating population” across surveys; second, the sample structures of floating populations are examined based on comparisons; and finally, under a unified model specification, regression models are used to assess how sample structure affects statistical outcomes using “years of education” and “migrant status” as dependent variables.

RESULTS

The analysis yields several key findings:

(1) Definitions of floating population differ substantially across surveys, reflecting varied target populations, though the definitions of all surveys are based on Hukou registration;

(2) The estimated proportions of floating population differ substantially across surveys, with notably lower shares of inter-provincial and intra-county floating population compared to those reported in the 2015 1% National Population Sample Survey;

(3) Floating population samples across surveys exhibit marked structural differences in key demographic attributes;

(4) Under identical model specifications, the estimated coefficients of basic variables differ significantly in terms of magnitude and significance, revealing both common patterns and the effects of sample structural bias on results.

CONCLUSIONS

This study highlights the structural inconsistencies among floating population samples in 6 large-scale social surveys, which are likely rooted in differences in measurement and sampling design. Researchers should carefully examine the design and methodology of survey data before using them, in order to understand the applicability and potential limitations of the data. When comparing findings across surveys, attention should be paid to consistent patterns, while discrepancies should be interpreted with caution and attributed—where appropriate—to structural bias rather than assumed heterogeneity.

CONTRIBUTION

This study provides the first systematic comparison of how floating populations are measured and sampled in six nationally significant surveys in China, and empirically evaluates the implications of sample differences for statistical analysis. It underscores the potential risks of sample bias in social science research and emphasizes the importance of accurate measurement, understanding sampling design, and cautious interpretation of data. The findings suggest that greater alignment in measurement definitions and statistical standards across surveys would enhance comparability and enable more reliable cross-survey validation, thereby strengthening the empirical foundation for floating population research in China.