地理科学进展 ›› 2013, Vol. 32 ›› Issue (12): 1835-1844.doi: 10.11820/dlkxjz.2013.12.012

• 理论与方法探讨 • 上一篇    下一篇

GIS中8 种图层级多核并行多边形叠置分析工具的实现及优化方法

范俊甫1,2, 马廷1, 季民3, 周玉科1, 许涛1,2   

  1. 1. 中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室, 北京 100101;
    2. 中国科学院大学, 北京 100049;
    3. 山东科技大学测绘学院, 青岛 266590
  • 收稿日期:2013-07-01 修回日期:2013-10-01 出版日期:2013-12-25 发布日期:2013-12-25
  • 作者简介:范俊甫(1985-),男,山东聊城人,博士生,主要研究方向为高性能地学计算与时空数据挖掘。E-mail:fanjf@lreis.ac.cn
  • 基金资助:
    国家科技支撑计划项目(2011BAH06B03,2011BAH24B10);中国科学院重点部署项目(KZZD-EW-07)。

Implementation and optimization of eight parallel polygon overlapping tools with OpenMP at the feature layer level in GIS

FAN Junfu1,2, MA Ting1, JI Min3, ZHOU Yuke1, XU Tao1,2   

  1. 1. State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. College of Geomatics, Shandong University of Science and Technology, Qingdao 266590, China
  • Received:2013-07-01 Revised:2013-10-01 Online:2013-12-25 Published:2013-12-25

摘要: GIS 中基于简单要素模型的非加权多边形叠置分析有交、差、并、交集取反、联合、更新、标识和空间连接8个基本工具。明确目标图层与叠加图层间多边形的数量对应关系,是实现图层级别并行多边形叠置工具集的首要前提。多边形差、交、标识、更新和空间连接操作需要处理目标多边形到叠加多边形间“一对多”的映射关系;合并、交集取反和联合操作需要处理“多对多”的映射关系。本文从多核数据并行角度,分析了8 种多边形叠加分析工具并行实现方法的异同,提出基于改进的分组关联最小化方法实现数据划分,基于顶点数量作为指标的负载平衡计算策略和多种并行优化方法和策略,实现了包含8 种操作的并行多边形叠置分析工具集。实验结果表明,改进的分组关联最小化数据划分方法能为多边形联合操作带来约92%的并行加速性能和更鲁棒的并行性;以顶点数量作为负载平衡指标,能以极小的代价为并行求差算法获得约21%的性能提升;二路归并能有效解决多边形并行合并过程中潜在的性能瓶颈;动态调度策略下多边形求交与合并工具具有更高的加速比;使用R树进行要素预过滤能为并行求差获得超过20 倍的加速;结构化存储的矢量数据批量加载策略能有效降低因磁盘I/O 带来的性能损失。

关键词: OpenMP, 并行优化, 多边形并行叠置, 负载平衡, 任务调度, 数据划分

Abstract: Simple feature model-based non-weighted polygon overlapping analysis operations include eight basic tools: intersection, difference, merge, symmetrical difference, union, update, identity, and spatial join. It is assumed that the determination of the "one-to-many" or "many-to-many" mapping relationships between polygons of the two overlapping layers is the primary prerequisite to the implementation of parallel polygon overlay toolset at the feature layer level. Polygon difference, intersection, identity, update, and spatial join algorithms must address the "one-to-many" mapping relationships between polygons of the overlapping layers. However, the "many-to-many" relationships must be handled by polygon merge, symmetrical difference, and union algorithms. In this research, we analyzed the differences and similarities among the parallel implementation approaches and optimization methods of the eight polygon overlapping tools from the perspective of data parallelism with the OpenMP parallel programming model. We proposed an improved group-relation-minimizing data partition method to realize complex data decomposition without geometry cutting and sewing, and adopted a vertex number indicator-based strategy for load balancing, and several optimization approaches, including the method of avoiding potential bottleneck in polygon merging, strategy for addressing the defect in polygon symmetrical differencing using the XOR operator of the Vatti polygon clipping algorithm, and the strategies of pre-filtering of features with R-tree and bulk loading of structural stored vector data in MySQL. We developed and implemented the parallel polygon overlapping toolset by systematically summarizing the applicative data decomposition method, characters of overlay operations, the Vatti polygon clipping algorithm, load balancing and parallel schedule strategies. The logical flow of parallel polygon union algorithm was introduced as an example to describe the general process of designing parallel polygon overlapping tools. Parallel overlapping case study was also conducted to analyze the effectiveness of the implementation and optimization methods. The experimental results show that the improved group-relation-minimizing data partition method can bring approximately 92% parallel acceleration and more parallel robustness for polygon union; different expected group size specified in the data division method proposed in this research can lead to different parallel speedup benchmarks; the vertex-number based load balancing strategy can give the parallel difference tool about 21% performance improvement; the two-way merge algorithm applied in the parallel merge tool can address the potential performance bottleneck; the dynamic schedule strategy of OpenMP can bring higher speedup than the static and guided schedule strategies; the pre-filtering approach can bring more than 20-fold acceleration for parallel polygon difference; the bulk loading strategy of structural stored vector data can effectively reduce the performance loss due to the disk I/O. The implementation approaches and optimization methods introduced in this study show different applicable features on the developing of the eight overlapping tools. The methods and strategies introduced above can be potential alternatives when launching similar parallelization tasks of other spatial analysis algorithms. This research not only provided theoretical basis and guidance of methodologies for the design and development of parallel polygon overlapping tools at the feature layer level in the multi-core environment, but also presented important practical significance in improving the system resource utilization and computational efficiency of spatial analyses for massive personal and low-cost GIS applications.

Key words: data partition, load balance, OpenMP, parallel optimization, parallel polygon overlapping, task schedule