Skip to contents

The Silhouette Width measures how well each observation fits within its assigned cluster compared to neighboring clusters. For each observation, the silhouette value is defined as \(s(i) = (b(i) - a(i)) / \max(a(i), b(i))\) where \(a(i)\) is the average distance to all other observations in the same cluster and \(b(i)\) is the minimum average distance to observations in any other cluster. The score returned is the mean silhouette width across all observations. Values close to 1 indicate well-clustered observations, values near 0 indicate observations on cluster boundaries, and negative values indicate possible misclassification.

The score function calls cluster::silhouette() from package cluster.

Details

If the task contains factor or ordered features, Gower distances (cluster::daisy()) are used instead of Euclidean distances.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.silhouette")
msr("clust.silhouette")

Meta Information

  • Task type: “clust”

  • Range: \([-1, 1]\)

  • Minimize: FALSE

  • Average: macro

  • Required Prediction: “partition”

  • Required Packages: mlr3, mlr3cluster, cluster

References

Rousseeuw, J P (1987). “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.” Journal of Computational and Applied Mathematics, 20, 53–65. doi:10.1016/0377-0427(87)90125-7 .