Skip to contents

Package website: release | dev

Cluster analysis for mlr3.

mlr3cluster is an extension package for cluster analysis within the mlr3 ecosystem. It is a successor of clustering capabilities of mlr2.

Installation

Install the last release from CRAN:

install.packages("mlr3cluster")

Install the development version from GitHub:

# install.packages("pak")
pak::pak("mlr-org/mlr3cluster")

Feature Overview

The current version of mlr3cluster contains:

  • A selection of 24 clustering learners that represent a wide variety of clusterers: partitional, hierarchical, fuzzy, etc.
  • A selection of 4 performance measures
  • Two built-in tasks to get started with clustering

Also, the package is integrated with mlr3viz which enables you to create great visualizations with just one line of code!

Cluster Analysis

Cluster Learners

Key Label Packages
clust.MBatchKMeans Mini Batch K-Means ClusterR
clust.SimpleKMeans K-Means (Weka) RWeka
clust.agnes Agglomerative Nesting cluster
clust.ap Affinity Propagation apcluster
clust.bico BICO stream
clust.birch BIRCH stream
clust.cmeans Fuzzy C-Means e1071, clue
clust.cobweb Cobweb RWeka
clust.dbscan DBSCAN dbscan
clust.dbscan_fpc DBSCAN (fpc) fpc
clust.diana Divisive Analysis cluster
clust.em Expectation-Maximization RWeka
clust.fanny Fuzzy Analysis cluster
clust.featureless Featureless Clustering Learner
clust.ff Farthest First RWeka
clust.hclust Hierarchical Clustering stats
clust.hdbscan HDBSCAN dbscan
clust.kkmeans Kernel K-Means kernlab
clust.kmeans K-Means stats, clue
clust.mclust Gaussian Mixture Model mclust
clust.meanshift Mean Shift LPCM
clust.optics OPTICS dbscan
clust.pam Partitioning Around Medoids cluster, clue
clust.xmeans X-Means RWeka

Cluster Measures

Key Label Packages
clust.ch Calinski Harabasz fpc
clust.dunn Dunn fpc
clust.silhouette Silhouette cluster
clust.wss Within Sum of Squares fpc

Example

library(mlr3)
library(mlr3cluster)

task = tsk("usarrests")
task
#> 
#> ── <TaskClust> (50x4): US Arrests ──────────────────────────────────────────────
#> • Target:
#> • Properties: -
#> • Features (4):
#>   • int (2): Assault, UrbanPop
#>   • dbl (2): Murder, Rape

learner = lrn("clust.kmeans")
prediction = learner$train(task)$predict(task)
measures = msrs(c("clust.wss", "clust.silhouette"))
prediction$score(measures, task)
#>        clust.wss clust.silhouette 
#>     9.639903e+04     5.926554e-01

More Resources

Check out the blogpost for a more detailed introduction to the package. Also, mlr3book has a section on clustering.

Future Plans

  • Add more learners and measures
  • Integrate the package with mlr3pipelines (work in progress)

If you have any questions, feedback or ideas, feel free to open an issue here.