Spherical k-means clustering for data on the unit hypersphere.
Calls skmeans::skmeans() from package skmeans.
The k parameter is set to 2 by default since skmeans::skmeans() doesn't have a default value for the number of
clusters.
Observations are partitioned by maximising cosine similarity to cluster prototypes. Predictions on new data assign
each observation to the prototype with the highest cosine similarity. Rows with zero norm are not allowed by
skmeans::skmeans().
Dictionary
This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():
Meta Information
Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, skmeans
Parameters
| Id | Type | Default | Levels | Range |
| k | integer | - | \([1, \infty)\) | |
| method | character | - | genetic, pclust, CLUTO, gmeans, kmndirs, LIH, LIHC | - |
| m | numeric | 1 | \([1, \infty)\) | |
| weights | untyped | 1 | - | |
| maxiter | integer | - | \([1, \infty)\) | |
| nruns | integer | - | \([1, \infty)\) | |
| popsize | integer | - | \([1, \infty)\) | |
| mutations | numeric | - | \([0, 1]\) | |
| reltol | numeric | - | \([0, \infty)\) | |
| verbose | logical | - | TRUE, FALSE | - |
References
Dhillon, S I, Modha, S D (2001). “Concept decompositions for large sparse text data using clustering.” Machine Learning, 42(1), 143–175. doi:10.1023/A:1007612920971 .
Hornik, Kurt, Feinerer, Ingo, Kober, Martin, Buchta, Christian (2012). “Spherical k-Means Clustering.” Journal of Statistical Software, 50(10), 1–22. doi:10.18637/jss.v050.i10 .
See also
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
Package mlr3extralearners for more learners.
as.data.table(mlr_learners)for a table of available Learners in the running session (depending on the loaded packages).mlr3pipelines to combine learners with pre- and postprocessing steps.
Extension packages for additional task types:
mlr3proba for probabilistic supervised regression and survival analysis.
mlr3cluster for unsupervised clustering.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Other Learner:
mlr_learners_clust.MBatchKMeans,
mlr_learners_clust.SimpleKMeans,
mlr_learners_clust.agnes,
mlr_learners_clust.ap,
mlr_learners_clust.bico,
mlr_learners_clust.birch,
mlr_learners_clust.clara,
mlr_learners_clust.cmeans,
mlr_learners_clust.cobweb,
mlr_learners_clust.dbscan,
mlr_learners_clust.dbscan_fpc,
mlr_learners_clust.diana,
mlr_learners_clust.em,
mlr_learners_clust.fanny,
mlr_learners_clust.featureless,
mlr_learners_clust.ff,
mlr_learners_clust.flexmix,
mlr_learners_clust.genie,
mlr_learners_clust.hclust,
mlr_learners_clust.hdbscan,
mlr_learners_clust.kcca,
mlr_learners_clust.kkmeans,
mlr_learners_clust.kmeans,
mlr_learners_clust.kproto,
mlr_learners_clust.mclust,
mlr_learners_clust.meanshift,
mlr_learners_clust.movMF,
mlr_learners_clust.optics,
mlr_learners_clust.pam,
mlr_learners_clust.protoclust,
mlr_learners_clust.som,
mlr_learners_clust.specc,
mlr_learners_clust.stdbscan,
mlr_learners_clust.tclust,
mlr_learners_clust.xmeans
Super classes
mlr3::Learner -> LearnerClust -> LearnerClustSKMeans
Examples
# Define the Learner and set parameter values
learner = lrn("clust.skmeans")
print(learner)
#>
#> ── <LearnerClustSKMeans> (clust.skmeans): Spherical K-Means ────────────────────
#> • Model: -
#> • Parameters: k=2
#> • Packages: mlr3, mlr3cluster, and skmeans
#> • Predict Types: [partition]
#> • Feature Types: logical, integer, and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: complete, exclusive, and partitional
#> • Other settings: use_weights = 'error', predict_raw = 'FALSE'
# Define a Task
task = tsk("usarrests")
# Train the learner on the task
learner$train(task)
# Print the model
print(learner$model)
#> A hard spherical k-means partition of 50 objects into 2 classes.
#> Class sizes: 33, 17
#> Call: skmeans::skmeans(x = as.matrix(task$data()), k = 2L, control = structure(list(), names = character(0)))
# Make predictions for the task
prediction = learner$predict(task)
# Score the predictions
prediction$score(task = task)
#> clust.dunn
#> 0.03303237