Mini Batch K-Means Clustering Learner
Source:R/LearnerClustMiniBatchKMeans.R
mlr_learners_clust.MBatchKMeans.RdA LearnerClust for mini batch k-means clustering implemented in ClusterR::MiniBatchKmeans().
ClusterR::MiniBatchKmeans() doesn't have a default value for the number of clusters.
Therefore, the clusters parameter here is set to 2 by default.
The predict method uses ClusterR::predict_MBatchKMeans() to compute the
cluster memberships for new data.
The learner supports both partitional and fuzzy clustering.
Dictionary
This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():
Meta Information
Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, ClusterR
Parameters
| Id | Type | Default | Levels | Range |
| clusters | integer | 2 | \([1, \infty)\) | |
| batch_size | integer | 10 | \([1, \infty)\) | |
| num_init | integer | 1 | \([1, \infty)\) | |
| max_iters | integer | 100 | \([1, \infty)\) | |
| init_fraction | numeric | 1 | \([0, 1]\) | |
| initializer | character | kmeans++ | optimal_init, quantile_init, kmeans++, random | - |
| early_stop_iter | integer | 10 | \([1, \infty)\) | |
| verbose | logical | FALSE | TRUE, FALSE | - |
| CENTROIDS | untyped | NULL | - | |
| tol | numeric | 1e-04 | \([0, \infty)\) | |
| tol_optimal_init | numeric | 0.3 | \([0, \infty)\) | |
| seed | integer | 1 | \((-\infty, \infty)\) |
References
Sculley, David (2010). “Web-scale k-means clustering.” In Proceedings of the 19th international conference on World wide web, 1177–1178.
See also
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
Package mlr3extralearners for more learners.
as.data.table(mlr_learners)for a table of available Learners in the running session (depending on the loaded packages).mlr3pipelines to combine learners with pre- and postprocessing steps.
Extension packages for additional task types:
mlr3proba for probabilistic supervised regression and survival analysis.
mlr3cluster for unsupervised clustering.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Other Learner:
mlr_learners_clust.SimpleKMeans,
mlr_learners_clust.agnes,
mlr_learners_clust.ap,
mlr_learners_clust.bico,
mlr_learners_clust.birch,
mlr_learners_clust.cmeans,
mlr_learners_clust.cobweb,
mlr_learners_clust.dbscan,
mlr_learners_clust.dbscan_fpc,
mlr_learners_clust.diana,
mlr_learners_clust.em,
mlr_learners_clust.fanny,
mlr_learners_clust.featureless,
mlr_learners_clust.ff,
mlr_learners_clust.hclust,
mlr_learners_clust.hdbscan,
mlr_learners_clust.kkmeans,
mlr_learners_clust.kmeans,
mlr_learners_clust.mclust,
mlr_learners_clust.meanshift,
mlr_learners_clust.optics,
mlr_learners_clust.pam,
mlr_learners_clust.xmeans
Super classes
mlr3::Learner -> mlr3cluster::LearnerClust -> LearnerClustMiniBatchKMeans
Examples
# Define the Learner and set parameter values
learner = lrn("clust.MBatchKMeans")
print(learner)
#>
#> ── <LearnerClustMiniBatchKMeans> (clust.MBatchKMeans): Mini Batch K-Means ──────
#> • Model: -
#> • Parameters: clusters=2
#> • Packages: mlr3, mlr3cluster, and ClusterR
#> • Predict Types: [partition] and prob
#> • Feature Types: logical, integer, and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: complete, exclusive, fuzzy, and partitional
#> • Other settings: use_weights = 'error'
# Define a Task
task = tsk("usarrests")
# Train the learner on the task
learner$train(task)
#> Warning: `predict_MBatchKMeans()` was deprecated in ClusterR 1.3.0.
#> ℹ Beginning from version 1.4.0, if the fuzzy parameter is TRUE the function
#> 'predict_MBatchKMeans' will return only the probabilities, whereas currently
#> it also returns the hard clusters
#> ℹ The deprecated feature was likely used in the ClusterR package.
#> Please report the issue at <https://github.com/mlampros/ClusterR/issues>.
# Print the model
print(learner$model)
#> $centroids
#> [,1] [,2] [,3] [,4]
#> [1,] 235.50 12.08333 26.23333 71.16667
#> [2,] 86.25 4.07500 14.22500 48.00000
#>
#> $WCSS_per_cluster
#> [,1] [,2]
#> [1,] 7889.297 4086.948
#>
#> $best_initialization
#> [1] 1
#>
#> $iters_per_initialization
#> [,1]
#> [1,] 26
#>
#> attr(,"class")
#> [1] "MBatchKMeans" "k-means clustering"
# Make predictions for the task
prediction = learner$predict(task)
# Score the predictions
prediction$score(task = task)
#> clust.dunn
#> 0.06244552