Original Articles

Research Progress in Time Series Clustering Methods Based on Characteristics

Expand
  • State Key Lab of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China

Received date: 2011-10-01

  Revised date: 2012-03-01

  Online published: 2012-10-25

Abstract

As terabyte time series data pour into the world, more and more attentions have been paid to the technique of analyzing this data. To understand discrepancy between these data, time series clustering methods have been used to divide them into different groups by similarities. Due to high dimension of time series, the traditional clustering methods for static data is not valid for time series clustering problem when they are susceptible to noise, and can hardly define suitable similarity which are prone to a meaningless result. It is also vexatious for many other methods to solve the clustering problem with missing or unequal data. Time series clustering methods based on characteristics could deal with these problems and discover the essential similarities of time series in all directions. According to characteristics of time series, this paper aimed to review the research progress of characteristics-based clustering methods for time series. Firstly, we introduced the definition and classified the different characteristics of time series. Then we reviewed different time series clustering methods based on characteristics and summarized the generality of each method. Finally we discussed some deficiencies of existing methods, and predicted the future of the relative research.

Cite this article

SONG Ci, PEI Tao . Research Progress in Time Series Clustering Methods Based on Characteristics[J]. PROGRESS IN GEOGRAPHY, 2012 , 31(10) : 1307 -1317 . DOI: 10.11820/dlkxjz.2012.10.008

References

[1] Shumway R H, Stoffer D S. Time Series Analysis and ItsApplications with R Examples. New York: Springer,2009.

[2] Han J W, Kamber M. Data Mining: Concepts and techniques.Singapore: Elsevier, 2006.

[3] Košmelj K, Batagelj V. Cross-sectional approach forclustering time varying data. Journal of Classification1990, 7: 99-109.

[4] Balasubramaniyan R, Hüllermeier E, Weskamp N, et al.Clustering of gene expression data using a localshape-based similarity measure. Bioinformatics 2005, 21(7): 1069-1077.

[5] Liao T W. Clustering of time series data: A survey. PatternRecognition 2005, 38(11): 1857-1874.

[6] Díaz S P, Vilar J A. Comparing several parametric andnonparametric approaches to time series clustering: Asimulation study. Journal of Classification, 2010, 27(3):333-362.

[7] Keogh E J, Pazzani M J. An enhanced representation oftime series which allows fast and accurate classification,Clustering and Relevance Feedback//Procs. of the 4thConference on Knowledge Discovery in Databases,1998: 239-241.

[8] Chen Y G, Nascimento M A, Ooi B C, et al. SpADe: Onshape-based pattern detection in streaming time series//Proceedings of the 23rd International Conference on DataEngineering, IEEE, 2007: 786-795.

[9] Wang X Z, Smith K, Hyndman R. Characteristic-basedclustering for time series data. Data Mining and KnowledgeDiscovery, 2006, 13(3): 335-364.

[10] Rose O. Estimation of the Hurst Parameter ofLong-Range Dependent Time Series. Research Report,1996.

[11] Hilborn R C, Ottino J M, Shinbrot T. Chaos and nonlineardynamics: An introduction for scientists and engineers.AIChE Journal 1995, 41(7): 1831-1832.

[12] Tian Z, Raghu R, Miron L. BIRCH: An efficient dataclustering method for very large databases. SIGMODRec, 1996, 25(2): 103-114.

[13] Karypis G, Han S, Kumar V. Chameleon: Hierarchicalclustering using dynamic modeling. IEEE Computer,1999, 32(8): 68-75.

[14] Ankerst M, Breunig M M, Kriegel H P, et al. OPTICS:Ordering points to identify the clustering structure. SIGMODRec, 1999, 28(2): 49-60.

[15] Wang W, Yang J, Muntz R. STING: A statistical informationgrid approach to spatial data mining//Proceedings ofthe 23rd Conference on VLDB, 1997: 186-195.

[16] Biernacki C, Celeux G, Govaert G. Assessing a mixturemodel for clustering with the integrated completed likelihood.IEEE Trans, 2000, 22(7): 719-725.

[17] Keogh E, Ratanamahatana C A. Exact indexing of dynamictime warping. Knowledge and Information Systems,2005, 7(3): 358-386.

[18] Möller-Levet C S, Klawonn F, Cho K H, et al. Clusteringof unevenly sampled gene expression time-series data.Fuzzy Sets and Systems, 2005, 152(1): 49-66.

[19] Möller-Levet C S, Klawonn F, Cho K H, et al. Fuzzyclustering of short time-series and unevenly distributedsampling points//Proceedings of the 5th InternationalSymposium on Intelligent Data Analysis, Berlin, Germany,August 28-30, 2003.

[20] Fu T C, Chung F L, Vincent N, et al. Pattern discoveryfrom stock time series using self-organizing maps//KDD2001 Workshop on Temporal Data Mining. San Francisco,2001: 27-37.

[21] Hsu K C, Li S T. Clustering spatial-temporal precipitationdata using wavelet transform and self-organizingmap neural network. Advances in Water Resources 2010,33(2): 190-200.

[22] Lee J G, Han J W, Whang K Y. Trajectory clustering: apartition-and-group framework. Proceedings of ACMSIGMOD International Conference on Management ofData, 2007: 593-604.

[23] Nanopoulos A, Alcock R, Manolopoulos Y. Featurebasedclassification of time-series data. International Journalof Computer Research, 2001: 49-61.

[24] Ouyang R, Ren L, Cheng W, et al. Similarity search andpattern discovery in hydrological time series data mining.Hydrological Processes, 2010, 24(9): 1198-1210.

[25] Kontaki M, Papadopoulos A N, Manolopoulos Y, et al.Continuous trend-based clustering in data streams. DataWarehousing and Knowledge Discovery, 2008, 5182:251-262.

[26] Kumar M, Patel N R, Woo J. Clustering seasonality patternsin the presence of errors. in ACM KDD ConferenceProceedings, 2002: 557-563.

[27] Wang X, Wirth A, Wang L. Structure-based statisticalfeatures and multivariate time series clustering//Proceedingsof the Seventh IEEE International Conference on DataMining, 2007: 351-360.

[28] Caiado J, Crato N, Peña D. A periodogram-based metricfor time series classification. Computational Statistics &Data Analysis 2006, 50(10): 2668-2684.

[29] Kakizawa Y, Shumway R H, Taniguchi M. Discriminationand Clustering for Multivariate Time Series. J. Amer.Stat. Assoc, 1998, 93(441): 328-340.

[30] Shumway R H. Time-frequency clustering and discriminantanalysis. Statistics & Probability Letters, 2003, 63(3): 307-314.

[31] Alonso A M, Berrendero J R, Hernández A, et al. Timeseries clustering based on forecast densities. ComputationalStatistics & Data Analysis, 2006, 51(2): 762-776.

[32] Singhal A, Seborg D E. Clustering multivariate time-seriesdata. Journal of Chemometrics, 2005, 19(8):427-438.

[33] Keogh E, Kasetty S. On the need for time series datamining benchmarks: A survey and empirical demonstration.Data Mining and Knowledge Discovery 2003, 7(4):349-371.

[34] Piccolo D. A distance measure for classifying ARIMAmodels. Journal of Time Series Analysis, 1990, 11(2):153-164.

[35] Maharaj E A. Cluster of time series. Journal of Classification,2000, 17(2): 297-314.

[36] Maharaj E A. Comparison and classification of stationarymultivariate time series. Pattern Recognition, 1999,32(7): 1129-1138.

[37] Ramoni M, Sebastiani P, Cohen P. Bayesian Clusteringby Dynamics. Machine Learning, 2002, 47(1): 91-121.

[38] Ramoni M, Sebastiani P, Cohen P. Multivariate clusteringby dynamics//Proceedings of the Seventeenth NationalConference on Artificial Intelligence, 2000: 633-638.

[39] Xiong Y, Yeung D Y. Mixtures of ARMA Models forModel-Based Time Series Clustering. Proceedings ofIEEE International Conference on Data Mining, 2002:717-720.

[40] Bicego M, Murino V, Figueiredo M A T. Similarity-based clustering of sequences using hidden Markovmodels. Machine Learning and Data Mining in PatternRecognition, 2003, 2734: 86-95.

[41] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedingsof the IEEE 1989, 77(2): 257-286

[42] Oates T, Firoiu L, Cohen P R. Clustering time serieswith hidden markov models and dynamic time warping.Proceedings of the IJCAI-99 Workshop on Neural, Symbolic,and Reinforcement Learning Methods for SequenceLearning, 1999.

[43] Li C, Biswas G. Temporal Pattern Generation Using HiddenMarkov Model Based Unsupervised Classification.Advances in Intelligent Data Analysis., 1999: 245-256.

[44] Li C, Biswas G, Dale M, et al. Building models of ecologicaldynamics using HMM based temporal data clustering:A preliminary study. Advances in Intelligent DataAnalysis, 2001: 53-62, doi: 10.1007/3-540-44816-0_6.

[45] Jain A K. Data clustering: 50 years beyond K-means.Pattern Recognition Letters, 2009, 31(8): 651-666.

[46] Wang N Y, Chen S M. Temperature prediction and TAIFEXforecasting based on automatic clustering techniquesand two-factors high-order fuzzy time series. ExpertSystems with Applications, 2009, 36(2): 2143-2154.

[47] FrÄuhwirth-Schnatter S. Model-based clustering of timeseries: A rview from a Bayesian perspective. Manuscript,2011.

[48] Pakhira M K, Bandyopadhyay S, Maulik U. Validity indexfor crisp and fuzzy clusters. Pattern Recognition2004, 37(3): 487-501.
Outlines

/