CLARA Clustering Learner

Clustering Large Applications (CLARA) clustering. Calls cluster::clara() from package cluster.

CLARA extends the PAM algorithm to handle larger datasets by working on sub-datasets of fixed size. The k parameter is set to 2 by default since cluster::clara() doesn't have a default value for the number of clusters. The predict method uses clue::cl_predict() to compute the cluster memberships for new data.

Initial parameter values

keep.data:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the training data in the model to save memory.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.clara")
lrn("clust.clara")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, cluster, clue

Parameters

Id	Type	Default	Levels	Range
k	integer	-		$[1, \infty)$
metric	character	euclidean	euclidean, manhattan, jaccard	-
stand	logical	FALSE	TRUE, FALSE	-
samples	integer	5		$[1, \infty)$
sampsize	integer	-		$[1, \infty)$
trace	integer	0		$[0, \infty)$
medoids.x	logical	TRUE	TRUE, FALSE	-
keep.data	logical	TRUE	TRUE, FALSE	-
rngR	logical	FALSE	TRUE, FALSE	-
pamLike	logical	FALSE	TRUE, FALSE	-
correct.d	logical	TRUE	TRUE, FALSE	-

References

Kaufman, Leonard, Rousseeuw, J P (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Schubert, Erich, Rousseeuw, J P (2019). “Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms.” In Similarity Search and Applications: 12th International Conference, SISAP 2019, Newark, NJ, USA, October 2–4, 2019, Proceedings 12, 171–187. Springer.

Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
Package mlr3extralearners for more learners.
Dictionary of Learners: mlr3::mlr_learners
as.data.table(mlr_learners) for a table of available Learners in the running session (depending on the loaded packages).
mlr3pipelines to combine learners with pre- and postprocessing steps.
Package mlr3viz for some generic visualizations.
Extension packages for additional task types:
- mlr3proba for probabilistic supervised regression and survival analysis.
- mlr3cluster for unsupervised clustering.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustCLARA

Methods

Inherited methods

`LearnerClustCLARA$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustCLARA$new()

`LearnerClustCLARA$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustCLARA$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# Define the Learner and set parameter values
learner = lrn("clust.clara")
print(learner)
#> 
#> ── <LearnerClustCLARA> (clust.clara): CLARA ────────────────────────────────────
#> • Model: -
#> • Parameters: k=2, keep.data=FALSE
#> • Packages: mlr3, mlr3cluster, cluster, and clue
#> • Predict Types: [partition]
#> • Feature Types: logical, integer, and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: complete, exclusive, and partitional
#> • Other settings: use_weights = 'error', predict_raw = 'FALSE'

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)
#> Call:	 cluster::clara(x = task$data(), k = 2L, keep.data = FALSE) 
#> Medoids:
#>      Assault Murder Rape UrbanPop
#> [1,]     255   12.1 35.1       74
#> [2,]     115    6.0 18.0       66
#> Objective function:	 38.4178
#> Clustering vector: 	 int [1:50] 1 1 1 1 1 1 2 1 1 1 2 2 1 2 2 2 2 1 ...
#> Cluster sizes:	    	 21 29 
#> Best sample:
#>  [1]  2  3  4  5  6  8  9 10 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
#> [26] 29 30 32 33 34 35 36 37 38 39 40 41 43 45 46 47 48 49 50
#> 
#> Available components:
#> [1] "sample"     "medoids"    "i.med"      "clustering" "objective" 
#> [6] "clusinfo"   "diss"       "call"       "silinfo"   

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)
#> clust.dunn 
#>  0.1033191

Initial parameter values

Dictionary

Meta Information

Parameters

References

See also

Super classes

Methods

Public methods

LearnerClustCLARA$new()

Usage

LearnerClustCLARA$clone()

Usage

Arguments

Examples

`LearnerClustCLARA$new()`

`LearnerClustCLARA$clone()`