Skip to contents

The Calinski-Harabasz index (also known as the Variance Ratio Criterion) is the ratio of between-cluster variance to within-cluster variance, adjusted for the number of clusters and observations. It is defined as \(CH = \frac{\mathrm{tr}(B) / (k - 1)}{\mathrm{tr}(W) / (n - k)}\) where \(B\) is the between-cluster scatter matrix, \(W\) is the within-cluster scatter matrix, \(k\) is the number of clusters, and \(n\) is the number of observations. Higher values indicate better-defined clusters.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.ch")
msr("clust.ch")

Meta Information

  • Task type: “clust”

  • Range: \([0, \infty)\)

  • Minimize: FALSE

  • Average: macro

  • Required Prediction: “partition”

  • Required Packages: mlr3, mlr3cluster

References

Caliński, Tadeusz, Harabasz, Jerzy (1974). “A dendrite method for cluster analysis.” Communications in Statistics, 3(1), 1–27. doi:10.1080/03610927408827101 .