plotting results of hierarchical clustering ontop of a matrix of data in python

How can I plot a dendrogram right on top of a matrix of values, reordered appropriately to reflect the clustering, in Python? An example is in the bottom of the following figure:

I use scipy.cluster.dendrogram to make my dendrogram and perform hierarchical clustering on a matrix of data. How can I then plot the data as a matrix where the rows have been reordered to reflect a clustering induced by the cutting the dendrogram at a particular threshold, and have the dendrogram plotted alongside the matrix? I know how to plot the dendrogram in scipy, but not how to plot the intensity matrix of data with the right scale bar next to it.

Any help on this would be greatly appreciated.

Hierarchical clustering from confusion matrix with python

Using on the following answer, I tried to code hierarchical class clustering based on confusion matrix. Confusion matrix is used to evaluate results of classification problem and isn’t symmetric. Each

hierarchical clustering with gene expression matrix in python

how can I do a hierarchical clustering (in this case for gene expression data) in Python in a way that shows the matrix of gene expression values along with the dendrogram? What I mean is like the exa

Microarray hierarchical clustering and PCA with python

I’m trying to analyze microarray data using hierarchical clustering of the microarray columns (results from the individual microarray replicates) and PCA. I’m new to python. I have python 2.7.3, biop

Are there any good hierarchical clustering packages in python which take distance matrix?

I have a distance matrix composed of pair-wise levenshtein’s distance. I was using scikits-learn. But hierarchical clustering algorithm doesn’t take distance matrix as input for clustering. SO I have

problem with hierarchical clustering in Python

I am doing a hierarchical clustering a 2 dimensional matrix by correlation distance metric (i.e. 1 – Pearson correlation). My code is the following (the data is in a variable called data): from hclu

hierarchical clustering on correlations in Python scipy/numpy?

How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I’d like to hierarchically clustering by correlations of each entry across t

Convex hulls of hierarchical clustering in Python

I’m using hierarchical clustering to try to visualize a large set of data that has been flattened to two dimensions. What I want to do is create a visualization that allows me to look at the data from

Clustering with Scipy in Python? (hierarchical clustering)

I’m a bit confused about the clusering with Scipy in Python. Here is my sourcecode: import scipy.spatial.distance as dist import numpy, scipy dataMatrix = numpy.array(matrix) distMatrix = dist.pdist(d

Clustering huge data matrix in python?

I want to cluster 1,5 million of chemical compounds. This means having 1.5 x 1.5 Million distance matrix… I think I can generate such a big table using pyTables but now – having such a table how wil

Python get clustered data-Hierachical Clustering

I used following python script to do a hierarchical clustering and print the dendogram. Please consider I’m new to Data-mining. import numpy as np import distance import scipy.cluster.hierarchy import


The question does not define matrix very well: “matrix of values”, “matrix of data”. I assume that you mean a distance matrix. In other words, element D_ij in the symmetric nonnegative N-by-N distance matrix D denotes the distance between two feature vectors, x_i and x_j. Is that correct?

If so, then try this (edited June 13, 2010, to reflect two different dendrograms):

import scipy
import pylab
import scipy.cluster.hierarchy as sch

# Generate random features and distance matrix.
x = scipy.rand(40)
D = scipy.zeros([40,40])
for i in range(40):
    for j in range(40):
        D[i,j] = abs(x[i] - x[j])

# Compute and plot first dendrogram.
fig = pylab.figure(figsize=(8,8))
ax1 = fig.add_axes([0.09,0.1,0.2,0.6])
Y = sch.linkage(D, method='centroid')
Z1 = sch.dendrogram(Y, orientation='right')

# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3,0.71,0.6,0.2])
Y = sch.linkage(D, method='single')
Z2 = sch.dendrogram(Y)

# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.6])
idx1 = Z1['leaves']
idx2 = Z2['leaves']
D = D[idx1,:]
D = D[:,idx2]
im = axmatrix.matshow(D, aspect='auto', origin='lower',

# Plot colorbar.
axcolor = fig.add_axes([0.91,0.1,0.02,0.6])
pylab.colorbar(im, cax=axcolor)

plotting results of hierarchical clustering ontop of a matrix of data in python

Good luck! Let me know if you need more help.

Edit: For different colors, adjust the cmap attribute in imshow. See the scipy/matplotlib docs for examples. That page also describes how to create your own colormap. For convenience, I recommend using a preexisting colormap. In my example, I used YlGnBu.

Edit: add_axes (see documentation here) accepts a list or tuple: (left, bottom, width, height). For example, (0.5,0,0.5,1) adds an Axes on the right half of the figure. (0,0.5,1,0.5) adds an Axes on the top half of the figure.

Most people probably use add_subplot for its convenience. I like add_axes for its control.

To remove the border, use add_axes([left,bottom,width,height], frame_on=False). See example here.

If in addition to the matrix and dendrogram it is required to show the labels of the elements, the following code can be used, that shows all the labels rotating the x labels and changing the font size to avoid overlapping on the x axis. It requires moving the colorbar to have space for the y labels:

axmatrix.set_xticklabels(idx1, minor=False)

pylab.xticks(rotation=-90, fontsize=8)

axmatrix.set_yticklabels(idx2, minor=False)

axcolor = fig.add_axes([0.94,0.1,0.02,0.6])

The result obtained is this (with a different color map):

plotting results of hierarchical clustering ontop of a matrix of data in python