Pertanika Journal

Go to Pertanika

Go to JTAS Home

Go to Pertanika Facebook

Home / Regular Issue / JST Vol. 26 (4) Oct. 2018 / JST-1003-2018

Statistical Estimators as an Alternative to Standard Deviation in Weighted Euclidean Distance Cluster Analysis

Paul Inuwa Dalatu and Habshah Midi

Pertanika Journal of Science & Technology, Volume 26, Issue 4, October 2018

Keywords: Clustering, estimators, K-Means, simulation, weighted

Published on: 24 Oct 2018

Abstract

Clustering is basically one of the major sources of primary data mining tools. It makes researchers understand the natural grouping of attributes in datasets. Clustering is an unsupervised classification method with the major aim of partitioning, where objects in the same cluster are similar, and objects which belong to different clusters vary significantly, with respect to their attributes. However, the classical Standardized Euclidean distance, which uses standard deviation to down weight maximum points of the ith features on the distance clusters, has been criticized by many scholars that the method produces outliers, lack robustness, and has 0% breakdown points. It also has low efficiency in normal distribution. Therefore, to remedy the problem, we suggest two statistical estimators which have 50% breakdown points namely the Sn and Qn estimators, with 58% and 82% efficiency, respectively. The proposed methods evidently outperformed the existing methods in down weighting the maximum points of the ith features in distance-based clustering analysis.

ISSN 0128-7680

e-ISSN 2231-8526

Article ID

JST-1003-2018