The mode clustering

In multivariate analysis (also called unsupervised learning), a common statistical problem is to cluster data pints. For instance, given a data as follows:

EX3-4

 

 

 

 

 

 

 

 

The goal is to cluster the data into several partitions. For instance, like this:

EX3-7

 

 

 

 

 

 

 

 

There are many different methods for clustering. Common approaches include the k-mean clustering, spectral clustering, linkage methods…e.t.c.

Here, I want to introduce the mode clustering method which is a simple but not commonly used technique in statistics. The mode clustering is to cluster the data based on the modes (local maximums) of the density function. The idea is simple: given a density function, for each point, we follow the gradient path for the density until we arrive a mode. Back to our example, the mode clustering looks like this:

EX3-5

 

 

 

 

 

 

 

 

Each blue path is the gradient path from each data point. We have four clusters corresponding to the four modes and every point is assigned to one cluster/mode.

The algorithm for mode clustering is called the “mean-shift” algorithm. This is a well-known method in computer vision and image segmentation. In fact, the mean shift algorithm is pushing data points according to the gradient path of a kernel density estimator to the density function.

To sum up, the mode clustering is

  1. find the estimated density function,
  2. evaluate the gradient of the estimated density,
  3. shift data points according to the gradient.

Since the density estimator is the kernel density estimator, the statistical properties are well-studied. Thus, we can study the properties for the mode clustering by using theories for the kernel density estimator.

Interestingly, the mode clustering is not well-known to the statistical society. But I believe that in the future, this method should attract more attentions from statisticians since compared with other methods, the mode clustering has a well-established theory.

EX3-6

 

 

 

 

 

 

 

 

 

Reference: Enhanced Mode Clustering. (2014) Yen-Chi Chen, Christopher R. Genovese, Larry Wasserman. http://arxiv.org/abs/1406.1780

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s