PROGRESS IN GEOGRAPHY ›› 2021, Vol. 40 ›› Issue (10): 1664-1677.doi: 10.18306/dlkxjz.2021.10.005

• Articles • Previous Articles     Next Articles

Modeling urban residential land price distribution using multi-source data and ensemble learning: A case of Wuhan City

ZHANG Peng1,2(), HU Shougeng1,2,*(), YANG Shengfu1,2, CHENG Peikun1,3   

  1. 1. School of Public Administration, China University of Geosciences, Wuhan 430074, China
    2. Key Laboratory for Rule of Law Research, Ministry of Natural Resources, Wuhan 430074, China
    3. Xi'an Lintong Bureau of Investment Cooperation, Lintong 710600, Shaanxi, China
  • Received:2020-11-22 Revised:2021-06-15 Online:2021-10-28 Published:2021-12-28
  • Contact: HU Shougeng E-mail:zhangpeng_cug@163.com;husg2009@gmail.com
  • Supported by:
    Major Program of National Social Science Foundation of China(18ZDA053)

Abstract:

Characterizing the spatial distribution of urban residential land prices (RLPs) is essential for timely improving urban planning and management, as well as for effectively realizing urban smart growth. However, mapping urban RLPs at a fine scale remains challenging, due to the complex nonlinear relationship between RLPs and their potential determinants. This study developed a grid-level urban RLP mapping method based on big geo-data and ensemble learning technology to meet the needs of rapid and accurate monitoring of urban RLP dynamics. Using ensemble learning technology, combined with predictor variables extracted from points of interest (POIs) and NPP-VIIRS nighttime light images, the fine-scale RLPs in Wuhan City in 2018 were mapped through the following steps. First, the kernel density of POIs and the intensity of nighttime lights were extracted and aggregated at the 500 m×500 m grid level as the predictor variables of RLPs. Second, several RLP prediction models were established using four individual machine learning algorithms (MLAs) and bagging and stacking ensemble methods. Finally, the prediction accuracy or errors of different models were evaluated and compared, and the best performing model was selected to estimate the RLPs of the grids with no observations in Wuhan City. The results show that: 1) Among all the individual MLAs, the support vector regression (SVR) algorithm has the best prediction performance, followed by the k-nearest neighbor algorithm (k-NN), Gaussian process regression (GPR), and back propagation neural network (BP-NN) algorithms. 2) In terms of improving the prediction accuracy of individual MLAs, the performance of the stacking method is better than that of the bagging method. The stacking #1 model that integrates the SVR and k-NN algorithms has the smallest prediction error, with %MAE of 8.29%, and R2 of 0.814. 3) The RLP map generated by the proposed methodological framework can effectively reveal the circular characteristics and local singularity of the RLP distribution. This study provides new ideas, methods, and technical means for rapidly and accurately mapping urban RLPs, which is conducive to the improvement of urban RLP monitoring systems in the era of big data.

Key words: urban residential land price, land price distribution, machine learning, POI, nighttime light, Wuhan City