The Calinski-Harabasz index (also known as the Variance Ratio Criterion) is the ratio of between-cluster variance to within-cluster variance, adjusted for the number of clusters and observations. It is defined as \(CH = \frac{\mathrm{tr}(B) / (k - 1)}{\mathrm{tr}(W) / (n - k)}\) where \(B\) is the between-cluster scatter matrix, \(W\) is the within-cluster scatter matrix, \(k\) is the number of clusters, and \(n\) is the number of observations. Higher values indicate better-defined clusters.
Dictionary
This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the
associated sugar function mlr3::msr():
Meta Information
Task type: “clust”
Range: \([0, \infty)\)
Minimize: FALSE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster
References
Caliński, Tadeusz, Harabasz, Jerzy (1974). “A dendrite method for cluster analysis.” Communications in Statistics, 3(1), 1–27. doi:10.1080/03610927408827101 .
See also
Dictionary of Measures: mlr3::mlr_measures
as.data.table(mlr_measures) for a complete table of all (also dynamically created) mlr3::Measure implementations.
Other cluster measures:
mlr_measures_clust.avg_between,
mlr_measures_clust.avg_within,
mlr_measures_clust.davies_bouldin,
mlr_measures_clust.dunn,
mlr_measures_clust.dunn2,
mlr_measures_clust.entropy,
mlr_measures_clust.pearsongamma,
mlr_measures_clust.silhouette,
mlr_measures_clust.wb_ratio,
mlr_measures_clust.wss