Machine Learning:K-Means Mathematics

Published On 2021/12/30 Thursday, Singapore

This post covers the mathematics behind K-Means. Specifically, it covers the following:


1. Cost Function

Assume we have the training dataset $D={\mathbf{x}_1,…,\mathbf{x}_n}$, Where $\mathbf{x}_i$ represents features where $\mathbf{x}_i\in \mathbb{R}^m$, $m$ is the number of feature variables, $n$ is the number of samples.

The k-means algorithm divides the $n$ samples to $k$ disjoint clusters $C= {C_1,C_2,…,C_k}$, each cluster described by its centroid $\mathbf{\mu}_k$ which is calculated as the mean of the samples in the cluster $C_k$.

The objectives are twofold:

Thus, the k-means algorithm aims to choose the centroids $\boldsymbol{\mu}={\mu_1,\mu_2,…,\mu_k}$ and assign a cluster id $\mathbf{c} ={c_1,c_2,…,c_k}$ to the samples to minimize the within cluster sum of squares :

\[\begin{align*} \arg\min_{\boldsymbol{\mu},\mathbf{c}}E(\boldsymbol{\mu},\mathbf{c}) &= \arg\min_{\boldsymbol{\mu},\mathbf{c}} \sum^{K}_{k=1}\sum_{\mathbf{x}_i \in C_{k}}||\mathbf{x}_i-\mu_k||^2\\ &= \arg\min_{\boldsymbol{\mu},\mathbf{c}} \sum^{K}_{k=1}\sum_{\mathbf{x}_i \in C_k}||\mathbf{x}_i-\mu_k||^2\\ &=\arg\min_{ \boldsymbol{\mu},\mathbf{c}} \sum^{n}_{i=1}||\mathbf{x}_i-\mu_k||^2 \end{align*}\]

Where $\mathbf{x}i \in C{k}$ indicates the sample $\mathbf{x}i$ belongs to the cluster $k$. $\sum{\mathbf{x}_i \in C_k}| \mathbf{x}_i -u_k|^2$ is the sum of the distance from samples within the cluster $C_k$ to the centroid $\mu_k$. $E$ is the sum of the distance of all samples to their corresponding cluster centroid.

The cost function of K-means is as average of the distance of all samples to their corresponding centroid, also called distortion:

\[J(\boldsymbol{\mu},\mathbf{c}) = \frac{1}{n} \sum^{n}_{i=1}||\mathbf{x}_i-\mu_{c_i}||^2\tag{1}\]

Where $\mu_{c_i}$ is the centriod of sample $\mathbf{x}_i$.


2. Optimization

To find the optimal cluster centroids that minimize the cost function:

\[c_i = \arg\min_{\boldsymbol{\mu},\mathbf{c}} J(\boldsymbol{\mu},\mathbf{c}) \; \; w.r.t. \; \; \boldsymbol{\mu}\]

The second and third step are repeated until converge.


3. Initialization

Recommended ways of doing random initialization are


4. How to Choose the number of clusters

When it comes to choose the the number of clusters, there ways are used generally


5. Evaluating Performance

DUN index



Reference & Resources





💚 Back to Home