K-Means Clustering Learner from Weka

A LearnerClust for Simple K Means clustering implemented in RWeka::SimpleKMeans(). The predict method uses RWeka::predict.Weka_clusterer() to compute the cluster memberships for new data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.SimpleKMeans")
lrn("clust.SimpleKMeans")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, RWeka

Parameters

Id	Type	Default	Levels	Range
A	untyped	"weka.core.EuclideanDistance"		-
C	logical	FALSE	TRUE, FALSE	-
fast	logical	FALSE	TRUE, FALSE	-
I	integer	100		$[1, \infty)$
init	integer	0		$[0, 3]$
M	logical	FALSE	TRUE, FALSE	-
max_candidates	integer	100		$[1, \infty)$
min_density	integer	2		$[1, \infty)$
N	integer	2		$[1, \infty)$
num_slots	integer	1		$[1, \infty)$
O	logical	FALSE	TRUE, FALSE	-
periodic_pruning	integer	10000		$[1, \infty)$
S	integer	10		$[0, \infty)$
t2	numeric	-1		$(-\infty, \infty)$
t1	numeric	-1.5		$(-\infty, \infty)$
V	logical	FALSE	TRUE, FALSE	-
output_debug_info	logical	FALSE	TRUE, FALSE	-

References

Witten, H I, Frank, Eibe (2002). “Data mining: practical machine learning tools and techniques with Java implementations.” Acm Sigmod Record, 31(1), 76–77.

Forgy, W E (1965). “Cluster analysis of multivariate data: efficiency versus interpretability of classifications.” Biometrics, 21, 768–769.

Lloyd, P S (1982). “Least squares quantization in PCM.” IEEE Transactions on Information Theory, 28(2), 129–137.

MacQueen, James (1967). “Some methods for classification and analysis of multivariate observations.” In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, 281–297.

Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners
Package mlr3extralearners for more learners.
Dictionary of Learners: mlr3::mlr_learners
as.data.table(mlr_learners) for a table of available Learners in the running session (depending on the loaded packages).
mlr3pipelines to combine learners with pre- and postprocessing steps.
Extension packages for additional task types:
- mlr3proba for probabilistic supervised regression and survival analysis.
- mlr3cluster for unsupervised clustering.
mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces.

Super classes

mlr3::Learner -> mlr3cluster::LearnerClust -> LearnerClustSimpleKMeans

Methods

Inherited methods

Method `new()`

Creates a new instance of this R6 class.

Usage

LearnerClustSimpleKMeans$new()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustSimpleKMeans$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# Define the Learner and set parameter values
learner = lrn("clust.SimpleKMeans")
print(learner)
#> 
#> ── <LearnerClustSimpleKMeans> (clust.SimpleKMeans): K-Means (Weka) ─────────────
#> • Model: -
#> • Parameters: list()
#> • Packages: mlr3, mlr3cluster, and RWeka
#> • Predict Types: [partition]
#> • Feature Types: logical, integer, and numeric
#> • Encapsulation: none (fallback: -)
#> • Properties: complete, exclusive, and partitional
#> • Other settings: use_weights = 'error'

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)
#> 
#> kMeans
#> ======
#> 
#> Number of iterations: 4
#> Within cluster sum of squared errors: 6.596893867946197
#> 
#> Initial starting points (random):
#> 
#> Cluster 0: 113,7.2,21,65
#> Cluster 1: 159,4.9,29.3,67
#> 
#> Missing values globally replaced with mean/mode
#> 
#> Final cluster centroids:
#>                          Cluster#
#> Attribute    Full Data          0          1
#>                 (50.0)     (30.0)     (20.0)
#> ============================================
#> Assault         170.76   114.4333     255.25
#> Murder           7.788       4.87     12.165
#> Rape            21.232    15.9433     29.165
#> UrbanPop         65.54    63.6333       68.4
#> 
#> 
#> 

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)
#> clust.dunn 
#> 0.06459841

Dictionary

Meta Information

Parameters

References

See also

Super classes

Methods

Public methods

Method new()

Usage

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `clone()`