K-prototypes clustering for mixed-type data.
Calls clustMixType::kproto() from package clustMixType.
The k parameter is set to 2 by default since clustMixType::kproto()
doesn't have a default value for the number of clusters.
Dictionary
This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():
Meta Information
Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”, “factor”, “ordered”
Required Packages: mlr3, mlr3cluster, clustMixType
Parameters
| Id | Type | Default | Levels | Range |
| k | untyped | - | - | |
| lambda | untyped | NULL | - | |
| type | character | huang | huang, gower | - |
| iter.max | integer | 100 | \([1, \infty)\) | |
| nstart | integer | 1 | \([1, \infty)\) | |
| na.rm | character | yes | yes, no, imp.internal, imp.onestep | - |
| verbose | logical | TRUE | TRUE, FALSE | - |
| init | character | NULL | nbh.dens, sel.cen, nstart.m | - |
| p_nstart.m | numeric | 0.9 | \([0, 1]\) |
References
Huang, Zhexue (1998). “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values.” Data Mining and Knowledge Discovery, 2(3), 283–304.
See also
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
Package mlr3extralearners for more learners.
as.data.table(mlr_learners)for a table of available Learners in the running session (depending on the loaded packages).mlr3pipelines to combine learners with pre- and postprocessing steps.
Extension packages for additional task types:
mlr3proba for probabilistic supervised regression and survival analysis.
mlr3cluster for unsupervised clustering.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.
Other Learner:
mlr_learners_clust.MBatchKMeans,
mlr_learners_clust.SimpleKMeans,
mlr_learners_clust.agnes,
mlr_learners_clust.ap,
mlr_learners_clust.bico,
mlr_learners_clust.birch,
mlr_learners_clust.clara,
mlr_learners_clust.cmeans,
mlr_learners_clust.cobweb,
mlr_learners_clust.dbscan,
mlr_learners_clust.dbscan_fpc,
mlr_learners_clust.diana,
mlr_learners_clust.em,
mlr_learners_clust.fanny,
mlr_learners_clust.featureless,
mlr_learners_clust.ff,
mlr_learners_clust.hclust,
mlr_learners_clust.hdbscan,
mlr_learners_clust.kkmeans,
mlr_learners_clust.kmeans,
mlr_learners_clust.mclust,
mlr_learners_clust.meanshift,
mlr_learners_clust.optics,
mlr_learners_clust.pam,
mlr_learners_clust.protoclust,
mlr_learners_clust.specc,
mlr_learners_clust.xmeans
Super classes
mlr3::Learner -> mlr3cluster::LearnerClust -> LearnerClustKProto
Examples
# Define the Learner and set parameter values
learner = lrn("clust.kproto")
print(learner)
#>
#> ── <LearnerClustKProto> (clust.kproto): K-Prototypes ───────────────────────────
#> • Model: -
#> • Parameters: k=2, verbose=FALSE
#> • Packages: mlr3, mlr3cluster, and clustMixType
#> • Predict Types: [partition]
#> • Feature Types: logical, integer, numeric, factor, and ordered
#> • Encapsulation: none (fallback: -)
#> • Properties: complete, exclusive, and partitional
#> • Other settings: use_weights = 'error'
# Define a mixed-type Task (kproto requires at least one factor variable)
data = data.frame(
x1 = c(1, 2, 10, 11, 1, 2, 10, 11),
x2 = factor(c("a", "a", "b", "b", "a", "a", "b", "b"))
)
task = as_task_clust(data)
# Train the learner on the task
learner$train(task)
# Print the model
print(learner$model)
#> Distance type: huang
#>
#> Numeric predictors: 1
#> Categorical predictors: 1
#> Lambda: 46.85714
#>
#> Number of Clusters: 2
#> Cluster sizes: 4 4
#> Within cluster error: 1 1
#>
#> Cluster prototypes:
#> x1 x2
#> <num> <fctr>
#> 1: 10.5 b
#> 2: 1.5 a
# Make predictions for the task
prediction = learner$predict(task)
# Score the predictions
prediction$score(task = task)
#> Warning: NAs introduced by coercion
#> clust.dunn
#> 8