地理科学进展 ›› 2012, Vol. 31 ›› Issue (10): 1307-1317.doi: 10.11820/dlkxjz.2012.10.008
宋辞, 裴韬
收稿日期:
2011-10-01
修回日期:
2012-03-01
出版日期:
2012-10-25
发布日期:
2012-10-25
通讯作者:
裴韬(1972-),男,副研究员,主要从事空间数据挖掘和空间信息统计等方面的研究。E-mail:peit@lreis.ac.cn
作者简介:
宋辞(1986-),男,博士研究生,主要研究方向为空间数据挖掘。E-mail:songc@lreis.ac.cn
基金资助:
中国科学院知识创新工程重要方向项目(KZCX2-YW-QN303);中国科学院地理资源所自主部署创新项目(200905004);863 项目(2009AA12Z227)。
SONG Ci, PEI Tao
Received:
2011-10-01
Revised:
2012-03-01
Online:
2012-10-25
Published:
2012-10-25
摘要: 时间序列聚类可以根据相似性将对象集分为不同的组, 从而反映出同组对象的相似性特征和不同组对象之间的差异特征。当序列维度较高时, 传统的时间序列聚类方法容易受噪声影响, 难以定义合适的相似性度量, 聚类结果往往意义不明确。当数据有缺失或不等长时, 聚类方法也难以实施。基于上述问题, 一些学者提出了基于特征的时间序列聚类方法, 不仅可以解决上述问题, 还可以发现序列本质特征的相似性。本文根据时间序列的不同特征, 综述了基于特征的时间序列聚类方法的研究进展, 并进行了分析和评述;最后对未来研究进行了展望。
宋辞, 裴韬. 基于特征的时间序列聚类方法研究进展[J]. 地理科学进展, 2012, 31(10): 1307-1317.
SONG Ci, PEI Tao. Research Progress in Time Series Clustering Methods Based on Characteristics[J]. PROGRESS IN GEOGRAPHY, 2012, 31(10): 1307-1317.
[1] Shumway R H, Stoffer D S. Time Series Analysis and ItsApplications with R Examples. New York: Springer,2009.[2] Han J W, Kamber M. Data Mining: Concepts and techniques.Singapore: Elsevier, 2006.[3] Košmelj K, Batagelj V. Cross-sectional approach forclustering time varying data. Journal of Classification1990, 7: 99-109.[4] Balasubramaniyan R, Hüllermeier E, Weskamp N, et al.Clustering of gene expression data using a localshape-based similarity measure. Bioinformatics 2005, 21(7): 1069-1077.[5] Liao T W. Clustering of time series data: A survey. PatternRecognition 2005, 38(11): 1857-1874.[6] Díaz S P, Vilar J A. Comparing several parametric andnonparametric approaches to time series clustering: Asimulation study. Journal of Classification, 2010, 27(3):333-362.[7] Keogh E J, Pazzani M J. An enhanced representation oftime series which allows fast and accurate classification,Clustering and Relevance Feedback//Procs. of the 4thConference on Knowledge Discovery in Databases,1998: 239-241.[8] Chen Y G, Nascimento M A, Ooi B C, et al. SpADe: Onshape-based pattern detection in streaming time series//Proceedings of the 23rd International Conference on DataEngineering, IEEE, 2007: 786-795.[9] Wang X Z, Smith K, Hyndman R. Characteristic-basedclustering for time series data. Data Mining and KnowledgeDiscovery, 2006, 13(3): 335-364.[10] Rose O. Estimation of the Hurst Parameter ofLong-Range Dependent Time Series. Research Report,1996.[11] Hilborn R C, Ottino J M, Shinbrot T. Chaos and nonlineardynamics: An introduction for scientists and engineers.AIChE Journal 1995, 41(7): 1831-1832.[12] Tian Z, Raghu R, Miron L. BIRCH: An efficient dataclustering method for very large databases. SIGMODRec, 1996, 25(2): 103-114.[13] Karypis G, Han S, Kumar V. Chameleon: Hierarchicalclustering using dynamic modeling. IEEE Computer,1999, 32(8): 68-75.[14] Ankerst M, Breunig M M, Kriegel H P, et al. OPTICS:Ordering points to identify the clustering structure. SIGMODRec, 1999, 28(2): 49-60.[15] Wang W, Yang J, Muntz R. STING: A statistical informationgrid approach to spatial data mining//Proceedings ofthe 23rd Conference on VLDB, 1997: 186-195.[16] Biernacki C, Celeux G, Govaert G. Assessing a mixturemodel for clustering with the integrated completed likelihood.IEEE Trans, 2000, 22(7): 719-725.[17] Keogh E, Ratanamahatana C A. Exact indexing of dynamictime warping. Knowledge and Information Systems,2005, 7(3): 358-386.[18] Möller-Levet C S, Klawonn F, Cho K H, et al. Clusteringof unevenly sampled gene expression time-series data.Fuzzy Sets and Systems, 2005, 152(1): 49-66.[19] Möller-Levet C S, Klawonn F, Cho K H, et al. Fuzzyclustering of short time-series and unevenly distributedsampling points//Proceedings of the 5th InternationalSymposium on Intelligent Data Analysis, Berlin, Germany,August 28-30, 2003.[20] Fu T C, Chung F L, Vincent N, et al. Pattern discoveryfrom stock time series using self-organizing maps//KDD2001 Workshop on Temporal Data Mining. San Francisco,2001: 27-37.[21] Hsu K C, Li S T. Clustering spatial-temporal precipitationdata using wavelet transform and self-organizingmap neural network. Advances in Water Resources 2010,33(2): 190-200.[22] Lee J G, Han J W, Whang K Y. Trajectory clustering: apartition-and-group framework. Proceedings of ACMSIGMOD International Conference on Management ofData, 2007: 593-604.[23] Nanopoulos A, Alcock R, Manolopoulos Y. Featurebasedclassification of time-series data. International Journalof Computer Research, 2001: 49-61.[24] Ouyang R, Ren L, Cheng W, et al. Similarity search andpattern discovery in hydrological time series data mining.Hydrological Processes, 2010, 24(9): 1198-1210.[25] Kontaki M, Papadopoulos A N, Manolopoulos Y, et al.Continuous trend-based clustering in data streams. DataWarehousing and Knowledge Discovery, 2008, 5182:251-262.[26] Kumar M, Patel N R, Woo J. Clustering seasonality patternsin the presence of errors. in ACM KDD ConferenceProceedings, 2002: 557-563.[27] Wang X, Wirth A, Wang L. Structure-based statisticalfeatures and multivariate time series clustering//Proceedingsof the Seventh IEEE International Conference on DataMining, 2007: 351-360.[28] Caiado J, Crato N, Peña D. A periodogram-based metricfor time series classification. Computational Statistics &Data Analysis 2006, 50(10): 2668-2684.[29] Kakizawa Y, Shumway R H, Taniguchi M. Discriminationand Clustering for Multivariate Time Series. J. Amer.Stat. Assoc, 1998, 93(441): 328-340.[30] Shumway R H. Time-frequency clustering and discriminantanalysis. Statistics & Probability Letters, 2003, 63(3): 307-314.[31] Alonso A M, Berrendero J R, Hernández A, et al. Timeseries clustering based on forecast densities. ComputationalStatistics & Data Analysis, 2006, 51(2): 762-776.[32] Singhal A, Seborg D E. Clustering multivariate time-seriesdata. Journal of Chemometrics, 2005, 19(8):427-438.[33] Keogh E, Kasetty S. On the need for time series datamining benchmarks: A survey and empirical demonstration.Data Mining and Knowledge Discovery 2003, 7(4):349-371.[34] Piccolo D. A distance measure for classifying ARIMAmodels. Journal of Time Series Analysis, 1990, 11(2):153-164.[35] Maharaj E A. Cluster of time series. Journal of Classification,2000, 17(2): 297-314.[36] Maharaj E A. Comparison and classification of stationarymultivariate time series. Pattern Recognition, 1999,32(7): 1129-1138.[37] Ramoni M, Sebastiani P, Cohen P. Bayesian Clusteringby Dynamics. Machine Learning, 2002, 47(1): 91-121.[38] Ramoni M, Sebastiani P, Cohen P. Multivariate clusteringby dynamics//Proceedings of the Seventeenth NationalConference on Artificial Intelligence, 2000: 633-638.[39] Xiong Y, Yeung D Y. Mixtures of ARMA Models forModel-Based Time Series Clustering. Proceedings ofIEEE International Conference on Data Mining, 2002:717-720.[40] Bicego M, Murino V, Figueiredo M A T. Similarity-based clustering of sequences using hidden Markovmodels. Machine Learning and Data Mining in PatternRecognition, 2003, 2734: 86-95.[41] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedingsof the IEEE 1989, 77(2): 257-286[42] Oates T, Firoiu L, Cohen P R. Clustering time serieswith hidden markov models and dynamic time warping.Proceedings of the IJCAI-99 Workshop on Neural, Symbolic,and Reinforcement Learning Methods for SequenceLearning, 1999.[43] Li C, Biswas G. Temporal Pattern Generation Using HiddenMarkov Model Based Unsupervised Classification.Advances in Intelligent Data Analysis., 1999: 245-256.[44] Li C, Biswas G, Dale M, et al. Building models of ecologicaldynamics using HMM based temporal data clustering:A preliminary study. Advances in Intelligent DataAnalysis, 2001: 53-62, doi: 10.1007/3-540-44816-0_6.[45] Jain A K. Data clustering: 50 years beyond K-means.Pattern Recognition Letters, 2009, 31(8): 651-666.[46] Wang N Y, Chen S M. Temperature prediction and TAIFEXforecasting based on automatic clustering techniquesand two-factors high-order fuzzy time series. ExpertSystems with Applications, 2009, 36(2): 2143-2154.[47] FrÄuhwirth-Schnatter S. Model-based clustering of timeseries: A rview from a Bayesian perspective. Manuscript,2011.[48] Pakhira M K, Bandyopadhyay S, Maulik U. Validity indexfor crisp and fuzzy clusters. Pattern Recognition2004, 37(3): 487-501. |
[1] | 黄莹泽, 邱炳文, 何玉花, 张珂, 邹凤丽. 东北地区水稻扩张的海拔优势区间分析[J]. 地理科学进展, 2020, 39(9): 1557-1564. |
[2] | 房艳刚,刘本城,刘建志. 农业多功能的地域类型与优化策略——以吉林省为例[J]. 地理科学进展, 2019, 38(9): 1349-1360. |
[3] | 马明清, 袁武, 葛全胜, 袁文, 杨林生, 李汉青, 李萌. “一带一路”若干区域社会发展态势大数据分析[J]. 地理科学进展, 2019, 38(7): 1009-1020. |
[4] | 钟炜菁, 王德. 基于居民行为周期特征的城市空间研究[J]. 地理科学进展, 2018, 37(8): 1106-1118. |
[5] | 李久枫, 余华飞, 付迎春, 赵耀龙. 广东省“人口—经济—土地—社会—生态”城市化协调度时空变化及其聚类模式[J]. 地理科学进展, 2018, 37(2): 287-298. |
[6] | 关雪峰, 曾宇媚. 时空大数据背景下并行数据处理分析挖掘的进展及趋势[J]. 地理科学进展, 2018, 37(10): 1314-1327. |
[7] | 杨微石, 郭旦怀, 逯燕玲, 王德强, 朱映秋, 张宝秀. 基于大数据的文化遗产认知分析方法——以北京旧城中轴线为例[J]. 地理科学进展, 2017, 36(9): 1111-1118. |
[8] | 郭思慧, 文聪聪, 何云, 裴韬. 居民出行活动特征与收入水平的关系——以上海市为例[J]. 地理科学进展, 2017, 36(9): 1158-1166. |
[9] | 秦昆, 周勍, 徐源泉, 徐雯婷, 罗萍. 城市交通热点区域的空间交互网络分析[J]. 地理科学进展, 2017, 36(9): 1149-1157. |
[10] | 张珣, 陈健璋, 黄金川, 于重重, 陈秀新. 基于空间聚类方法的京津冀城市群多层级空间结构研究[J]. 地理科学进展, 2017, 36(11): 1359-1367. |
[11] | 黄金川, 林浩曦, 漆潇潇. 空间管治视角下京津冀协同发展类型区划[J]. 地理科学进展, 2017, 36(1): 46-57. |
[12] | 安强, 杨兆萍, 徐晓亮, 时卉, 张璐. 南疆三地州贫困与旅游资源优势空间关联研究[J]. 地理科学进展, 2016, 35(4): 515-525. |
[13] | 范德芹, 赵学胜, 朱文泉, 郑周涛. 植物物候遥感监测精度影响因素研究综述[J]. 地理科学进展, 2016, 35(3): 304-319. |
[14] | 黄浠, 王中根, 桑燕芳, 杨默远, 刘晓聪, 巩同梁. 雅鲁藏布江流域不同源降水数据质量对比研究[J]. 地理科学进展, 2016, 35(3): 339-348. |
[15] | 吴健生, 许娜, 张曦文. 中国低碳城市评价与空间格局分析[J]. 地理科学进展, 2016, 35(2): 204-213. |
|