PROGRESS IN GEOGRAPHY ›› 2016, Vol. 35 ›› Issue (12): 1494-1505.doi: 10.18306/dlkxjz.2016.12.006

• Orginal Article • Previous Articles     Next Articles

Multiple scale spatialization of demographic data with multi-factor linear regression and geographically weighted regression models

Kejing WANG1,2(), Hongyan CAI1,*(), Xiaohuan YANG1   

  1. 1. State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China
    2. Zhejiang Academy of Surveying & Mapping, Hangzhou 310012, China
  • Online:2016-12-20 Published:2016-12-20
  • Contact: Hongyan CAI E-mail:wkj_3210@163.com;caihy@igsnrr.ac.cn
  • Supported by:
    National Natural Science Foundation of China, No.41271173, No.41301155;National Science and Technology Support Program of China, No.2012BAI32B06

Abstract:

Population distribution data are essential for socioeconomic and environmental studies, such as population estimation, spread of disease, natural disaster relief, and environmental protection. Existing research has proved that spatialized population grid data can precisely delineate the spatial pattern of population distribution, while model selection and size of grids may influence the accuracy of population distribution modeling. It is therefore important to estimate population distribution using appropriate models and at a proper spatial scale. This study mainly focused on the spatialization modeling of Anhui Province county-level population census data in 2010 at three grid scales. Anhui Province was selected for the study due to its complex landforms and significant difference of population distribution within its area. Population regionalization was carried out as a preprocessing step: 78 counties in Anhui Province were divided into four groups. Combining with land-use data and nighttime light (DMSP/OLS), urban residential areas were reclassified to reflect regional differences. Based on the population regionalization, multi-factor linear regression (MFLR) and geographically weighted regression (GWR) models were employed to integrate the reclassified urban residential land-use data with the rural residential land-use data. This study established three population spatial datasets at 1 km, 5 km, and 10 km gird scales. Comparing the two models’ precision at each scale, the results show that the modeling and grid scale have much influence on the accuracy of the spatialization result, which increased with the grid scale by using the MFLR model and the highest accuracy was achieved in the 10 km grid datasets. For the GWR model, the accuracy decreased as the grid scale increased, and the highest model accuracy was obtained at the 1 km scale. Overall, the GWR model had a higher accuracy (22.31%) than the MFLR model when taking into account the geographic location and local modeling. This study may provide a scientific basis for the production and application of population spatial data and provide a reference of spatialization for other types of statistical data in the future.

Key words: population distribution, spatialization, multi-scales, multi-factor linear regression, Geographically Weighted Regression (GWR), Anhui Province