Skip to contents

Clustering Large Applications (CLARA) clustering. Calls cluster::clara() from package cluster.

CLARA extends the PAM algorithm to handle larger datasets by working on sub-datasets of fixed size. The k parameter is set to 2 by default since cluster::clara() doesn't have a default value for the number of clusters. The predict method uses clue::cl_predict() to compute the cluster memberships for new data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.clara")
lrn("clust.clara")

Meta Information

  • Task type: “clust”

  • Predict Types: “partition”

  • Feature Types: “logical”, “integer”, “numeric”

  • Required Packages: mlr3, mlr3cluster, cluster, clue

Parameters

IdTypeDefaultLevelsRange
kinteger-\([1, \infty)\)
metriccharactereuclideaneuclidean, manhattan, jaccard-
standlogicalFALSETRUE, FALSE-
samplesinteger5\([1, \infty)\)
sampsizeinteger-\([1, \infty)\)
traceinteger0\([0, \infty)\)
medoids.xlogicalTRUETRUE, FALSE-
keep.datalogicalTRUETRUE, FALSE-
rngRlogicalFALSETRUE, FALSE-
pamLikelogicalFALSETRUE, FALSE-
correct.dlogicalTRUETRUE, FALSE-

References

Kaufman, Leonard, Rousseeuw, J P (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Schubert, Erich, Rousseeuw, J P (2019). “Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms.” In Similarity Search and Applications: 12th International Conference, SISAP 2019, Newark, NJ, USA, October 2–4, 2019, Proceedings 12, 171–187. Springer.

Super classes

mlr3::Learner -> mlr3cluster::LearnerClust -> LearnerClustCLARA

Methods

Inherited methods


Method new()

Creates a new instance of this R6 class.

Usage


Method clone()

The objects of this class are cloneable with this method.

Usage

LearnerClustCLARA$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

# Define the Learner and set parameter values
learner = lrn("clust.clara")
print(learner)
#> 
#> ── <LearnerClustCLARA> (clust.clara): CLARA ────────────────────────────────────
#> • Model: -
#> • Parameters: k=2
#> • Packages: mlr3, mlr3cluster, cluster, and clue
#> • Predict Types: [partition]
#> • Feature Types: logical, integer, and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: complete, exclusive, and partitional
#> • Other settings: use_weights = 'error'

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)
#> Call:	 cluster::clara(x = task$data(), k = 2L) 
#> Medoids:
#>      Assault Murder Rape UrbanPop
#> [1,]     255   12.1 35.1       74
#> [2,]     115    6.0 18.0       66
#> Objective function:	 38.4178
#> Clustering vector: 	 int [1:50] 1 1 1 1 1 1 2 1 1 1 2 2 1 2 2 2 2 1 ...
#> Cluster sizes:	    	 21 29 
#> Best sample:
#>  [1]  2  3  4  5  6  8  9 10 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
#> [26] 29 30 32 33 34 35 36 37 38 39 40 41 43 45 46 47 48 49 50
#> 
#> Available components:
#>  [1] "sample"     "medoids"    "i.med"      "clustering" "objective" 
#>  [6] "clusinfo"   "diss"       "call"       "silinfo"    "data"      

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)
#> clust.dunn 
#>  0.1033191