Skip to contents

Cluster analysis for mlr3

mlr3cluster is an extension package for cluster analysis within the mlr3 ecosystem. It is a successor of clustering capabilities of mlr2.

Installation

Install the last release from CRAN:

install.packages("mlr3cluster")

Install the development version from GitHub:

devtools::install_github("mlr-org/mlr3cluster")

Feature Overview

The current version of mlr3cluster contains:

  • A selection of 22 clustering learners that represent a wide variety of clusterers: partitional, hierarchical, fuzzy, etc.
  • A selection of 4 performance measures
  • Two built-in tasks to get started with clustering

Also, the package is integrated with mlr3viz which enables you to create great visualizations with just one line of code!

Cluster Analysis

Cluster Learners

ID Learner Package
clust.agnes Agglomerative Hierarchical Clustering cluster
clust.ap Affinity Propagation Clustering apcluster
clust.cmeans Fuzzy C-Means Clustering e1071
clust.cobweb Cobweb Clustering Algorithm RWeka
clust.dbscan Density-based Clustering dbscan
clust.dbscan_fpc Density-based Clustering with fpc fpc
clust.diana Divisive Hierarchical Clustering cluster
clust.em Expectation-Maximization Clustering RWeka
clust.fanny Fuzzy Clustering cluster
clust.featureless Simple Featureless Clustering mlr3cluster
clust.ff FarthestFirst Clustering Algorithm RWeka
clust.hdbscan HDBSCAN Clustering dbscan
clust.hclust Agglomerative Hierarchical Clustering stats
clust.kkmeans Kernel K-Means Clustering kernlab
clust.kmeans K-Means Clustering stats
clust.mclust Gaussian Mixture Models-Based Clustering mclust
clust.MBatchKMeans Mini Batch K-Means Clustering ClusterR
clust.meanshift Mean Shift Clustering LPCM
clust.optics OPTICS Clustering dbscan
clust.pam Clustering Around Medoids cluster
clust.SimpleKMeans K-Means Clustering (WEKA) RWeka
clust.xmeans K-Means with Automatic Determination of k RWeka

Cluster Measures

ID Measure Package
clust.dunn Dunn index fpc
clust.ch Calinski Harabasz Pseudo F-Statistic fpc
clust.silhouette Rousseeuw’s Silhouette Quality Index cluster
clust.wss Within Sum of Squares fpc

Example

library(mlr3)
library(mlr3cluster)

task = tsk("usarrests")
learner = lrn("clust.kmeans")
learner$train(task)
preds = learner$predict(task = task)

More Resources

Check out the blogpost for a more detailed introduction to the package. Also, mlr3book has a section on clustering.

Future Plans

  • Add more learners and measures
  • Integrate the package with mlr3pipelines (work in progress)

If you have any questions, feedback or ideas, feel free to open an issue here.