地理科学进展 ›› 2018, Vol. 37 ›› Issue (10): 1314-1327.doi: 10.18306/dlkxjz.2018.10.002

所属专题: 地理大数据

• 专栏:地理新青年 • 上一篇    下一篇

时空大数据背景下并行数据处理分析挖掘的进展及趋势

关雪峰(), 曾宇媚*()   

  1. 武汉大学测绘遥感信息工程国家重点实验室,武汉 430079
  • 收稿日期:2018-08-31 修回日期:2018-10-13 出版日期:2018-10-28 发布日期:2018-10-28
  • 通讯作者: 曾宇媚 E-mail:guanxuefeng@whu.edu.cn;zengyumei@whu.edu.cn
  • 作者简介:

    作者简介:关雪峰(1980-),男,湖北松滋人,副教授,研究方向为高性能地理计算,E-mail: guanxuefeng@whu.edu.cn

  • 基金资助:
    国家自然科学基金项目(41301411)

Research progress and trends of parallel processing, analysis, and mining of big spatiotemporal data

Xuefeng GUAN(), Yumei ZENG*()   

  1. State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
  • Received:2018-08-31 Revised:2018-10-13 Online:2018-10-28 Published:2018-10-28
  • Contact: Yumei ZENG E-mail:guanxuefeng@whu.edu.cn;zengyumei@whu.edu.cn
  • Supported by:
    National Natural Science Foundation of China, No.41301411

摘要:

随着互联网、物联网和云计算的高速发展,与时间、空间相关的数据呈现出“爆炸式”增长的趋势,时空大数据时代已经来临。时空大数据除具备大数据典型的“4V”特性外,还具备丰富的语义特征和时空动态关联特性,已经成为地理学者分析自然地理环境、感知人类社会活动规律的重要资源。然而在具体研究应用中,传统数据处理和分析方法已无法满足时空大数据高效存取、实时处理、智能挖掘的性能需求。因此,时空大数据与高性能计算/云计算融合是必然的发展趋势。在此背景下,本文首先从大数据的起源出发,回顾了大数据概念的发展历程,以及时空大数据的特有特征;然后分析了时空大数据研究应用产生的性能需求,总结了底层平台软硬件的发展现状;进而重点从时空大数据的存储管理、时空分析和领域挖掘3个角度对并行化现状进行了总结,阐述了其中存在的问题;最后指出了时空大数据研究发展趋势。

关键词: 时空大数据, 高性能软硬件, 并行空间分析, 数据挖掘, 进展及趋势

Abstract:

With the rapid development of the Internet, Internet of things, and cloud computing technology, data with geographical location and time tag are accumulated in an explosive way, and this indicates that we are in the era of big spatiotemporal data. In addition to the typical "4V" characteristics, big spatiotemporal data also contain rich semantic information and dynamic spatiotemporal patterns. Although massive spatiotemporal data have promoted the evolvement of various cross-disciplinary studies, traditional methods of data processing and analysis would no longer meet the requirements of efficient storage and real-time analysis of such data. Therefore, it is of great importance to integrate big spatiotemporal data with high-performance computing/cloud computing. To address this problem, this article begins with the concept and origin of big spatiotemporal data, and introduces its unique characteristics. Then, the performance requirements generated by current big data applications are analyzed, and the status quo of the underlying hardware and software is summarized. Furthermore, the article comprehensively reviews parallel processing, analysis, and mining methods for big spatiotemporal data. Finally, we conclude with the challenges and opportunities of storage, management, and parallel processing analysis of big spatiotemporal data.

Key words: big spatiotemporal data, high-performance computing, parallel spatial analysis, data mining, progress and trends