地理科学进展 ›› 2020, Vol. 39 ›› Issue (7): 1140-1148.doi: 10.18306/dlkxjz.2020.07.007

• 研究论文 • 上一篇    下一篇

面向ResearchGate的古气候文献数据采集系统的研制与应用测评

张学珍1,2(), 尹君1, 白孟鑫1,2, 李艳波1, 郑景云1,2,*()   

  1. 1. 中国科学院地理科学与资源研究所,中国科学院陆地表层格局与模拟重点实验室,北京100101
    2. 中国科学院大学,北京100049
  • 收稿日期:2019-05-23 修回日期:2019-10-09 出版日期:2020-07-28 发布日期:2020-09-28
  • 作者简介:张学珍(1981— ),男,山东济宁人,研究员,主要从事气候变化研究。E-mail: xzzhang@igsnrr.ac.cn
  • 基金资助:
    国家重点研发计划项目(2017YFA0603301);国家自然科学基金项目(41430528);中国科学院重点部署项目(ZDRW-ZS-2017-4);中国科学院前沿科学重点研究项目(QYZDB-SSW-DQC005);中国科学院青年创新促进会项目(2015038)

Development and application test of a collection system for paleoclimate research documents from ResearchGate

ZHANG Xuezhen1,2(), YIN Jun1, BAI Mengxin1,2, LI Yanbo1, ZHENG Jingyun1,2,*()   

  1. 1. Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2019-05-23 Revised:2019-10-09 Online:2020-07-28 Published:2020-09-28
  • Supported by:
    National Key Research and Development Program of China(2017YFA0603301);National Natural Science Foundation of China(41430528);Key Project of the Chinese Academy of Sciences(ZDRW-ZS-2017-4);Key Research Program of Frontier Sciences from CAS(QYZDB-SSW-DQC005);Youth Innovation Promotion Association, CAS(2015038)

摘要:

论文基于Linux平台,利用Python(V3.6)和MySQL(V5.7),开发了一套“面向ResearchGate的古气候文献数据采集系统”;并且通过人工判读从全球古气候资料共享网(https://www.ncdc.noaa.gov)数据库中遴选出1450篇古气候重建论文,对关键词进行了分类汇总,初步构建了用于古气候文献检索的关键词表。依据这一关键词表,利用古气候文献数据采集系统,从ResearchGate数据库中进行了文献检索。针对来自ResearchGate的32493篇文献和来自NCDC的1450篇文献,通过时间尺度、代用资料类型、气象要素、研究地区(国家)4个维度关键词词频的对比分析,发现2套文献数据关键词词频的相对差异基本一致。这表明依据初步构建的关键词表,自ResearchGate检索获取的古气候重建文献是有效的,能反映古气候重建研究现状。如此庞大数量的研究论文为下一步收集未被NCDC收录的古气候重建结果提供了丰富的数据源。“面向ResearchGate的古气候文献数据采集系统”达到了预期设计目标。

关键词: 古气候, ResearchGate, 文献数据, 采集系统, 应用测评

Abstract:

A collection system for paleoclimate research documents (CSPD) was developed in this study using Python (V3.6) and MySQL (V5.7) on the Linux platform. Meanwhile, 1450 research papers of paleoclimate from the National Climate Data Center (NCDC) were manually selected. The keywords from these papers were classified and, then, a keyword list for the research paper collection was prepared. Using the CSPD with the keyword list, we collected 32493 paleoclimate research papers from ResearchGate. To verify the validity of CSPD with the keyword list, we counted the frequencies of four categories of keywords from the 32493 paleoclimate research papers from ResearchGate and from the 1450 papers from NCDC, respectively. Then, the frequencies from the two document datasets were compared. The four categories of keywords refer to the dimensions of temporal scale, type of proxy data, meteorology factors, and study area. We found that the frequencies of the four categories of keywords match well for the two document datasets. This result suggests that the CSPD together with the keyword list is a valid method and the resulting document dataset represents the status of paleoclimate research. A large number of paleoclimate research documents from ResearchGate would work as a great source of paleoclimate reconstruction results, which have not been fully included by NCDC. The CSPD reached the design objective.

Key words: paleoclimate, ResearchGate, document data, collection system, application test