地理科学进展 ›› 2019, Vol. 38 ›› Issue (7): 1009-1020.doi: 10.18306/dlkxjz.2019.07.006

• 专栏:“一带一路” • 上一篇    下一篇

“一带一路”若干区域社会发展态势大数据分析

马明清1,2(), 袁武3, 葛全胜1, 袁文1,*(), 杨林生1, 李汉青4, 李萌1,2   

  1. 1. 中国科学院地理科学与资源研究所,北京 100101
    2. 中国科学院大学,北京 100049
    3. 北京理工大学计算机学院,北京 100081
    4. 中国公安部第一研究所,北京 100048
  • 收稿日期:2019-01-18 修回日期:2019-05-15 出版日期:2019-07-28 发布日期:2019-07-28
  • 通讯作者: 袁文 E-mail:mamq.16s@igsnrr.ac.cn;yuanw@lreis.ac.cn
  • 作者简介:

    马明清(1994— ),男,新疆伊犁人,硕士生,主要从事时空数据挖掘研究。E-mail: mamq.16s@igsnrr.ac.cn

  • 基金资助:
    中国科学院战略性先导科技专项(A类)(XDA23100103);中国科学院重点部署项目(ZDRW-ZS-2017-4)

Big data analysis of social development situation in regions along the Belt and Road

Mingqing MA1,2(), Wu YUAN3, Quansheng GE1, Wen YUAN1,*(), Linsheng YANG1, Hanqing LI4, Meng LI1,2   

  1. 1. Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China
    2. University of the Chinese Academy of Sciences, Beijing 100049, China
    3. Computer School, Beijing Institute of Technology, Beijing 100081, China
    4. First Research Institute of the Ministry of Public Security of PRC, Beijing 100048, China
  • Received:2019-01-18 Revised:2019-05-15 Online:2019-07-28 Published:2019-07-28
  • Contact: Wen YUAN E-mail:mamq.16s@igsnrr.ac.cn;yuanw@lreis.ac.cn
  • Supported by:
    Strategic Priority Research Program of the Chinese Academy of Sciences (Class A), No. XDA23100103;Key Project of the Chinese Academy of Sciences, No. ZDRW-ZS-2017-4.

摘要:

“一带一路”倡议已成为中国的基本国际政策,及时掌握沿线国家的社会发展态势,对确保该倡议的稳步推进与顺利实施至关重要。为此,论文将GDELT数据库作为数据来源,获取了“一带一路”沿线25个国家近5 a的英文新闻全文数据,引入主题模型,结合无监督方法(LDA)与监督方法(Labeled LDA)挖掘新闻数据中蕴含的主题,构建社会稳定度模型,分析各国社会发展态势。研究发现:① 沿线国家社会发展态势不均衡,可划分为4类,即稳定型,如阿曼、越南等;较稳定型,如乌兹别克斯坦、伊朗等;较高风险型,如科威特、约旦、巴基斯坦、缅甸;高风险型,如叙利亚、阿富汗等。② 通过新闻主题时空挖掘,可有效发现热点区域,例如论文发现安集延对中亚地区社会发展与稳定具有重要影响。③ 利用监督主题模型,能够发现乌兹别克斯坦经济产业结构,识别出重大社会事件,发现其社会安全风险及变化趋势。采用论文方法可有效挖掘新闻事件时空变化规律,发现各国潜在风险,支撑对沿线国家社会发展态势的实时动态监控,为“一带一路”倡议的实施提供辅助决策支持,具有重要的应用价值。

关键词: 一带一路, 时空数据挖掘, 主题模型, 社会发展态势, 社会稳定度, 大数据

Abstract:

The Belt and Road initiative has become China's basic international policy. Keeping abreast of the social development trend of countries along the Belt and Road is crucial to ensuring the steady progress and successful implementation of the initiative. To this end, this study used the Global Data on Events, Location and Tone (GDELT) as a data source to obtain the full-text English news data in 25 countries along the Belt and Road in the past five years, and analyzed the social development trends of various countries by introducing topic models and combining an unsupervised method—the latent Dirichlet allocation (LDA) and a supervised method—labeled latent Dirichlet allocation (Labeled LDA) to mine the topics contained in the news data, and construct a social stability model. The study found that: 1) The social development trend of the countries along the Belt and Road is uneven, and the countries can be divided into four categories: Stable, such as Oman, Vietnam; Relatively stable, such as Uzbekistan, Iran; Moderate risk, such as Kuwait, Jordan, Pakistan, Myanmar; High risk, such as Syria, Afghanistan. 2) Through the spatiotemporal mining of news topics, hot spots can be effectively identified. For example, this study found that Andijon has an important influence on the social development and stability of Central Asia. 3) The supervised topic model could reveal Uzbekistan's economic and industrial structure, identify major social events, and discover its social security risks and trend. This method can effectively explore the spatiotemporal changes of news events, discover potential risks of countries, support real-time dynamic monitoring of the social development trends of countries along the Belt and Road, and provide auxiliary decision support for the implementation of the Belt and Road initiative, and thus has important application value.

Key words: the Belt and Road Initiative, spatiotemporal data mining, theme modal, social situation, social stability, big data