Impact of Dataset Scaling on Hierarchical Clustering: A Comparative Analysis of Distance-Based and Ratio-Based Methods

Main Article Content

Ali Rashash R. Alzahrani

Abstract

In this study, the distance-based agglomerative hierarchical clustering techniques were compared to a ratio-based approach. Two real datasets, which were also used in a prior study by Roux (2018), were considered. Firstly, it was observed that the type of scaling applied to the datasets was found to affect the results of hierarchical clustering. Thus, various scaling methods were employed prior to implementing hierarchical clustering. Furthermore, two rank-based goodness-of-fit measures were used to evaluate the hierarchical clustering methods. In contrast to Roux (2018) findings, it was observed that the distance-based methods, such as Median linkage, Average linkage, and centroid linkage, performed better than the ratio-based method. The ratio-based methods also showed issues with branch crossing in the hierarchical clustering dendrogram. Consequently, this study illustrates that, with appropriate dataset scaling, the distance-based methods outperform ratio-based methods in terms of goodness-of-fit measures.

Article Details

References

  1. A.K. Jain, Data Clustering: 50 Years Beyond K-Means, Pattern Recognit. Lett. 31 (2010), 651-666. https://doi.org/10.1016/j.patrec.2009.09.011.
  2. A. Bouguettaya, Q. Yu, X. Liu, X. Zhou, A. Song, Efficient Agglomerative Hierarchical Clustering, Expert Syst. Appl. 42 (2015), 2785-2797. https://doi.org/10.1016/j.eswa.2014.09.054.
  3. H. Mittal, A.K. Tripathi, A.C. Pandey, P. Venu, V.G. Menon, R. Pal, A Novel Fuzzy Clustering-Based Method for Human Activity Recognition in Cloud-Based Industrial IoT Environment, Wireless Netw. (2022). https://doi.org/10.1007/s11276-022-03011-y.
  4. M.G. Kendall, A New Measure of Rank Correlation, Biometrika. 30 (1938), 81-93. https://doi.org/10.2307/2332226.
  5. L. Goodman, W. Kruskal, Measures of Association for Cross-Validations, Part 1, J. Amer. Stat. Assoc. 49 (1954), 732–764.
  6. M. Roux, A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms, J. Classif. 35 (2018), 345-366. https://doi.org/10.1007/s00357-018-9259-9.
  7. A. Tubb, A.J. Parker, G. Nickless, The Analysis of Romano‐British Pottery by Atomic Absorption Spectrophotometry, Archaeometry. 22 (1980), 153-171. https://doi.org/10.1111/j.1475-4754.1980.tb00939.x.
  8. R.A. Fisher, The Use of Multiple Measurements in Taxonomic Problems, Ann. Eugenics. 7 (1936), 179-188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x.
  9. R.A. Mollineda, E. Vidal, A Relative Approach to Hierarchical Clustering. In: Pattern Recognition and Applications, vol. 56, pp. 19–28. IOS Press, Amsterdam (2000).