Clustering Large Applications (CLARA) clustering.
Calls cluster::clara() from package cluster.
CLARA extends the PAM algorithm to handle larger datasets by working on sub-datasets of fixed size.
The k parameter is set to 2 by default since cluster::clara()
doesn't have a default value for the number of clusters.
The predict method uses clue::cl_predict() to compute the
cluster memberships for new data.
Dictionary
This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():
Meta Information
Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, cluster, clue
Parameters
| Id | Type | Default | Levels | Range |
| k | integer | - | \([1, \infty)\) | |
| metric | character | euclidean | euclidean, manhattan, jaccard | - |
| stand | logical | FALSE | TRUE, FALSE | - |
| samples | integer | 5 | \([1, \infty)\) | |
| sampsize | integer | - | \([1, \infty)\) | |
| trace | integer | 0 | \([0, \infty)\) | |
| medoids.x | logical | TRUE | TRUE, FALSE | - |
| keep.data | logical | TRUE | TRUE, FALSE | - |
| rngR | logical | FALSE | TRUE, FALSE | - |
| pamLike | logical | FALSE | TRUE, FALSE | - |
| correct.d | logical | TRUE | TRUE, FALSE | - |
References
Kaufman, Leonard, Rousseeuw, J P (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.
Schubert, Erich, Rousseeuw, J P (2019). “Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms.” In Similarity Search and Applications: 12th International Conference, SISAP 2019, Newark, NJ, USA, October 2–4, 2019, Proceedings 12, 171–187. Springer.
See also
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
Package mlr3extralearners for more learners.
as.data.table(mlr_learners)for a table of available Learners in the running session (depending on the loaded packages).mlr3pipelines to combine learners with pre- and postprocessing steps.
Extension packages for additional task types:
mlr3proba for probabilistic supervised regression and survival analysis.
mlr3cluster for unsupervised clustering.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Other Learner:
mlr_learners_clust.MBatchKMeans,
mlr_learners_clust.SimpleKMeans,
mlr_learners_clust.agnes,
mlr_learners_clust.ap,
mlr_learners_clust.bico,
mlr_learners_clust.birch,
mlr_learners_clust.cmeans,
mlr_learners_clust.cobweb,
mlr_learners_clust.dbscan,
mlr_learners_clust.dbscan_fpc,
mlr_learners_clust.diana,
mlr_learners_clust.em,
mlr_learners_clust.fanny,
mlr_learners_clust.featureless,
mlr_learners_clust.ff,
mlr_learners_clust.hclust,
mlr_learners_clust.hdbscan,
mlr_learners_clust.kkmeans,
mlr_learners_clust.kmeans,
mlr_learners_clust.kproto,
mlr_learners_clust.mclust,
mlr_learners_clust.meanshift,
mlr_learners_clust.optics,
mlr_learners_clust.pam,
mlr_learners_clust.protoclust,
mlr_learners_clust.specc,
mlr_learners_clust.xmeans
Super classes
mlr3::Learner -> mlr3cluster::LearnerClust -> LearnerClustCLARA
Examples
# Define the Learner and set parameter values
learner = lrn("clust.clara")
print(learner)
#>
#> ── <LearnerClustCLARA> (clust.clara): CLARA ────────────────────────────────────
#> • Model: -
#> • Parameters: k=2
#> • Packages: mlr3, mlr3cluster, cluster, and clue
#> • Predict Types: [partition]
#> • Feature Types: logical, integer, and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: complete, exclusive, and partitional
#> • Other settings: use_weights = 'error'
# Define a Task
task = tsk("usarrests")
# Train the learner on the task
learner$train(task)
# Print the model
print(learner$model)
#> Call: cluster::clara(x = task$data(), k = 2L)
#> Medoids:
#> Assault Murder Rape UrbanPop
#> [1,] 255 12.1 35.1 74
#> [2,] 115 6.0 18.0 66
#> Objective function: 38.4178
#> Clustering vector: int [1:50] 1 1 1 1 1 1 2 1 1 1 2 2 1 2 2 2 2 1 ...
#> Cluster sizes: 21 29
#> Best sample:
#> [1] 2 3 4 5 6 8 9 10 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
#> [26] 29 30 32 33 34 35 36 37 38 39 40 41 43 45 46 47 48 49 50
#>
#> Available components:
#> [1] "sample" "medoids" "i.med" "clustering" "objective"
#> [6] "clusinfo" "diss" "call" "silinfo" "data"
# Make predictions for the task
prediction = learner$predict(task)
# Score the predictions
prediction$score(task = task)
#> clust.dunn
#> 0.1033191