Mini Batch K-Means Clustering Learner

A LearnerClust for mini batch k-means clustering implemented in ClusterR::MiniBatchKmeans(). ClusterR::MiniBatchKmeans() doesn't have a default value for the number of clusters. Therefore, the clusters parameter here is set to 2 by default. The predict method uses ClusterR::predict_MBatchKMeans() to compute the cluster memberships for new data. The learner supports both partitional and fuzzy clustering.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.MBatchKMeans")
lrn("clust.MBatchKMeans")

Meta Information

Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, ClusterR

Parameters

Id	Type	Default	Levels	Range
clusters	integer	2		$[1, \infty)$
batch_size	integer	10		$[1, \infty)$
num_init	integer	1		$[1, \infty)$
max_iters	integer	100		$[1, \infty)$
init_fraction	numeric	1		$[0, 1]$
initializer	character	kmeans++	optimal_init, quantile_init, kmeans++, random	-
early_stop_iter	integer	10		$[1, \infty)$
verbose	logical	FALSE	TRUE, FALSE	-
CENTROIDS	untyped	NULL		-
tol	numeric	1e-04		$[0, \infty)$
tol_optimal_init	numeric	0.3		$[0, \infty)$
seed	integer	1		$(-\infty, \infty)$

References

Sculley, David (2010). “Web-scale k-means clustering.” In Proceedings of the 19th international conference on World wide web, 1177–1178.

Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
Package mlr3extralearners for more learners.
Dictionary of Learners: mlr3::mlr_learners
as.data.table(mlr_learners) for a table of available Learners in the running session (depending on the loaded packages).
mlr3pipelines to combine learners with pre- and postprocessing steps.
Extension packages for additional task types:
- mlr3proba for probabilistic supervised regression and survival analysis.
- mlr3cluster for unsupervised clustering.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.

Super classes

mlr3::Learner -> mlr3cluster::LearnerClust -> LearnerClustMiniBatchKMeans

Methods

Inherited methods

Method `new()`

Creates a new instance of this R6 class.

Usage

LearnerClustMiniBatchKMeans$new()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustMiniBatchKMeans$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# Define the Learner and set parameter values
learner = lrn("clust.MBatchKMeans")
print(learner)
#> 
#> ── <LearnerClustMiniBatchKMeans> (clust.MBatchKMeans): Mini Batch K-Means ──────
#> • Model: -
#> • Parameters: clusters=2
#> • Packages: mlr3, mlr3cluster, and ClusterR
#> • Predict Types: [partition] and prob
#> • Feature Types: logical, integer, and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: complete, exclusive, fuzzy, and partitional
#> • Other settings: use_weights = 'error'

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)
#> Warning: `predict_MBatchKMeans()` was deprecated in ClusterR 1.3.0.
#> ℹ Beginning from version 1.4.0, if the fuzzy parameter is TRUE the function
#>   'predict_MBatchKMeans' will return only the probabilities, whereas currently
#>   it also returns the hard clusters
#> ℹ The deprecated feature was likely used in the ClusterR package.
#>   Please report the issue at <https://github.com/mlampros/ClusterR/issues>.

# Print the model
print(learner$model)
#> $centroids
#>        [,1]     [,2]     [,3]     [,4]
#> [1,] 235.50 12.08333 26.23333 71.16667
#> [2,]  86.25  4.07500 14.22500 48.00000
#> 
#> $WCSS_per_cluster
#>          [,1]     [,2]
#> [1,] 7889.297 4086.948
#> 
#> $best_initialization
#> [1] 1
#> 
#> $iters_per_initialization
#>      [,1]
#> [1,]   26
#> 
#> attr(,"class")
#> [1] "MBatchKMeans"       "k-means clustering"

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)
#> clust.dunn 
#> 0.06244552

Dictionary

Meta Information

Parameters

References

See also

Super classes

Methods

Public methods

Method new()

Usage

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `clone()`