A New Approach to Determine Eps Parameter of DBSCAN Algorithm

Fatma Ozge Ozkok, Mete Celik

Abstract

In recent years, data analysis has become important with increasing data volume. Clustering, which groups objects according to their similarity, has an important role in data analysis. DBSCAN is one of the most effective and popular density-based clustering algorithm and has been successfully implemented in many areas. However, it is a challenging task to determine the input parameter values of DBSCAN algorithm, which are neighborhood radius, Eps, and minimum number of points, MinPts. The values of these parameters significantly affect clustering performance of the algorithm. In this study, we propose AE-DBSCAN algorithm, which includes a new method to determine the value of neighborhood radius Eps automatically. The experimental evaluations showed that the proposed method outperformed the analytical DBSCAN.

Keywords

AE-DBSCAN; Clustering; Data Mining; Density-Based Clustering

Full Text:

PDF
Submitted: 2017-08-25 14:51:35
Published: 2017-12-12 13:20:45
Search for citations in Google Scholar
Related articles: Google Scholar

References

M. Ester, H.-P. Kriegel, and X. Xu "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proc. KDD, Oregon, USA, 1996, pp. 226-231.

X. P. Yu, D. Zhou, and Y. Zhou, “A New Clustering Algorithm Based on Distance and Density,” in Proc. ICSSSM, Chongquing, China, 2005, pp. 1016-1021.

S. K. Popat, and M. Emmanuel, "Review and Comparative Study of Clustering Techniques," Int. J. of Computer Science and Information Technologies, vol. 5, no.1, pp. 805–12, 2014.

P. Liu, D. Zhou, and N. J. Wu,“VDBSCAN: Varied density based spatial clustering of applications with noise,” in Proc. ICSSSM, Chengdu, China, 2007, pp 1-4.

K. Khan, S. U. Rehman, K. Aziz, S. Fong and S. Sarasvady, "DBSCAN: Past, present and future." in Proc. ICADIWT, Bangalore, India, 2014, pp. 232-238.

A. Ram, S. Jalal, A. S. Jalal, and M. Kumar "A density based algorithm for discovering density varied clusters in large spatial databases," Int. J. of Computer Applications, vol. 3, no. 6, pp. 1-4, 2010.

A.K. Jain, M.N. Murty, and P.J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.

D. Birant and A. Kut, “ST-DBSCAN: An algorithm for clustering spatial-temporal data,” Data & Knowledge Engineering, vol. 60, no. 1, pp. 208–221, 2007.

M. Celik, F. Dadaser-Celik, and A. Dokuz, “Anomaly detection in temperature data using dbscan algorithm,” in Proc. INISTA, Istanbul, Turkey, 2011, pp. 91–95.

P. N. Tan, M. Steinbach, and V. Kumar, "Introduction to Data Mining," Boston Addison-Wesley, April 2005.

G. Sheikholeslami, S. Chatterjee, and A. Zhang, "Wave Cluster: A multi-resolution clustering approach for very large spatial databases," in Proc. VLDB, San Francisco, CA, 1998, pp.428-439.

G. Sudipto, R. Rastogi and K. Shim, "CURE: An efficient clustering algorithm for large Databases," in Proc. ACM SIGMOD, Seattle, WA, 1998, pp.73-84.

T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc. ACM SIGMOD, 1996, pp. 103–114.

W. Wang, J. Yang, and R. R. Muntz, “STING: A statistical information grid approach to spatial data mining,” in Proc VLDB, San Francisco, CA, USA, 1997, pp. 186–195.

M. Halkidi, Y. Batistakis, and M. Varzirgiannis, “On clustering validation techniques,” J. of Intelligent Information Systems, vol. 17, no. 2-3, pp. 107–145, 2001.

Karypis, G., Han, E.H., and Kumar, V.: “Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling,” IEEE Computer, vol. 32, no. 8, pp 68-75, August 1999.

Z. Chen and Y. F. Li, "Anomaly detection based on enhanced dbscan algorithm", Procedia Engineering, vol. 15, pp. 178-182, 2011.

H. Zhou, P. Want, and H. Li, "Research on adaptive parameters determination in DBSCAN algorithm," J. of Information & Computational Science, vol. 9, no. 7, pp. 1967-1973, 2012.

A. R. Chowdhury, M. E. Mollah, and M. A. Rahman, "An efficient method for subjectively choosing parameter k automatically in VDBSCAN (varied density based spatial clustering of applications with noise) algorithm," in Proc. ICCAE, Singapore, 2010, pp. 38-41.

M. Daszykowski, B. Walczak, and D. L. Massart, "Looking for Natural Patterns in Data. Part 1: Density Based Approach," Chemometrics and Intelligent Laboratory Systems, vol. 56, no. 2, pp. 83-92, 2001.

Clustering datasets, Available: http://cs.uef.fi/sipu/datasets/. Accessed on: April 23, 2017.

Abstract views:
586

Views:
PDF
1139




Copyright (c) 2017 International Journal of Intelligent Systems and Applications in Engineering

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
 
© Prof.Dr. Ismail SARITAS 2013-2018     -    Address: Selcuk University, Faculty of Technology 42031 Selcuklu, Konya/TURKEY.