Hierarchical Merging of Region Adjacency Graphs

Region Adjacency Graphs model regions in an image as nodes of a graph with edges between adjacent regions. Superpixel methods tend to over segment images, ie, divide into more regions than necessary. Performing a Normalized Cut and Thresholding Edge Weights are two ways of extracting a better segmentation out of this. What if we could combine two small regions into a bigger one ? If we keep combining small similar regions into bigger ones, we will end up with bigger regions which are significantly different from its adjacent ones. Hierarchical Merging explores this possibility. The current working code can be found at this Pull Request

Code Example

The merge_hierarchical function performs hierarchical merging on a RAG. It picks up the smallest weighing edge and combines the regions connected by it. The new region is adjacent to all previous neighbors of the two combined regions. The weights are updated accordingly. It continues doing so till the minimum edge weight in the graph in more than the supplied thresh value. The function takes a RAG as input where smaller edge weight imply similar regions. Therefore, we use the rag_mean_color function with the default "distance" mode for RAG construction. Here is a minimal code snippet.

from skimage import graph, data, io, segmentation, color

img = data.coffee()
labels = segmentation.slic(img, compactness=30, n_segments=400)
g = graph.rag_mean_color(img, labels)
labels2 = graph.merge_hierarchical(labels, g, 40)
g2 = graph.rag_mean_color(img, labels2)

out = color.label2rgb(labels2, img, kind='avg')
out = segmentation.mark_boundaries(out, labels2, (0, 0, 0))

I arrived at the threshold 40 after some trial and error. Here is the output.


The drawback here is that the thresh argument can vary significantly depending on image to image.

Comparison with Normalized Cut

Loosely speaking the normalized cut follows a top-down approach where as the hierarchical merging follow a bottom-up approach. Normalized Cut starts with the graph as a whole and breaks it down into smaller parts. On the other hand hierarchical merging, starts with individual regions and merges them into bigger ones till a criteria is reached. The Normalized Cut however, is much more robust and requires little tuning of its parameters as images change. Hierarchical merging is a lot faster, even though most of its computation logic is written in Python.

Effect of change in threshold

Setting a very low threshold, will not merge any regions and will give us back the original image. A very large threshold on the other hand would merge all regions and give return the image as just one big blob. The effect is illustrated below.











Hierarchical Merging in Action

With this modification the following code can output the effect of all the intermediate segmentation during each iteration.

from skimage import graph, data, io, segmentation, color
import time
from matplotlib import pyplot as plt

img = data.coffee()
labels = segmentation.slic(img, compactness=30, n_segments=400)
g = graph.rag_mean_color(img, labels)
labels2 = graph.merge_hierarchical(labels, g, 60)

c = 0

out = color.label2rgb(graph.graph_merge.seg_list[-10], img, kind='avg')
for label in graph.graph_merge.seg_list:
    out = color.label2rgb(label, img, kind='avg')
    out = segmentation.mark_boundaries(out, label, (0, 0, 0))
    io.imsave('/home/vighnesh/Desktop/agg/' + str(c) + '.png', out)
    c += 1

I then used avconv -f image2 -r 3 -i %d.png -r 20 car.mp4 to output a video. Below are a few examples.

In each of these videos, at every frame, a boundary dissapears. This means that the two regions separated by that boundary are merged. The frame rate is 5 FPS, so more than one region might be merged at a time.

Coffee Image


Car Image


Baseball Image



Normalized Cuts on Region Adjacency Graphs

In my last post I demonstrated how removing edges with high weights can leave us with a set of disconnected graphs, each of which represents a region in the image. The main drawback however was that the user had to supply a threshold. This value varied significantly depending on the context of the image. For a fully automated approach, we need an algorithm that can remove edges automatically.

The first thing that I can think of which does something useful in the above mention situation is the Minimum Cut Algorithm. It divides a graph into two parts, A and B such that the weight of the edges going from nodes in Set A to the nodes in Set B is minimum.

For the Minimum Cut algorithm to work, we need to define the weights of our Region Adjacency Graph (RAG) in such a way that similar regions have more weight. This way, removing lesser edges would leave us with the similar regions.

Getting Started

For all the examples below to work, you will need to pull from this Pull Request. The tests fail due to outdated NumPy and SciPy versions on Travis. I have also submitted a Pull Request to fix that. Just like the last post, I have a show_img function.

from skimage import graph, data, io, segmentation, color
from matplotlib import pyplot as plt
from skimage.measure import regionprops
from skimage import draw
import numpy as np

def show_img(img):

    width = img.shape[1]/75.0
    height = img.shape[0]*width/img.shape[1]
    f = plt.figure(figsize=(width, height))

I have modified the display_edges function for this demo. It draws nodes in yellow. Edges with low edge weights are greener and edges with high edge weight are more red.

def display_edges(image, g):
    """Draw edges of a RAG on its image

    Returns a modified image with the edges drawn. Edges with high weight are
    drawn in red and edges with a low weight are drawn in green. Nodes are drawn
    in yellow.

    image : ndarray
        The image to be drawn on.
    g : RAG
        The Region Adjacency Graph.
    threshold : float
        Only edges in `g` below `threshold` are drawn.

    out: ndarray
        Image with the edges drawn.

    image = image.copy()
    max_weight = max([d['weight'] for x, y, d in g.edges_iter(data=True)])
    min_weight = min([d['weight'] for x, y, d in g.edges_iter(data=True)])

    for edge in g.edges_iter():
        n1, n2 = edge

        r1, c1 = map(int, rag.node[n1]['centroid'])
        r2, c2 = map(int, rag.node[n2]['centroid'])

        green = 0,1,0
        red = 1,0,0

        line  = draw.line(r1, c1, r2, c2)
        circle = draw.circle(r1,c1,2)
        norm_weight = ( g[n1][n2]['weight'] - min_weight ) / ( max_weight - min_weight )

        image[line] = norm_weight*red + (1 - norm_weight)*green
        image[circle] = 1,1,0

    return image

To see demonstrate the display_edges function, I will load an image, which just has two regions of black and white.

demo_image = io.imread('bw.png')


Let’s compute the pre-segmenetation using the SLIC method. In addition to that, we will also use regionprops to give us the centroid of each region to aid the display_edges function.

labels = segmentation.slic(demo_image, compactness=30, n_segments=100)
labels = labels + 1  # So that no labelled region is 0 and ignored by regionprops
regions = regionprops(labels)

We will use label2rgb to replace each region with its average color. Since the image is so monotonous, the difference is hardly noticeable.

label_rgb = color.label2rgb(labels, demo_image, kind='avg')


We can use mark_boundaries to display region boundaries.

label_rgb = segmentation.mark_boundaries(label_rgb, labels, (0, 1, 1))


As mentioned earlier we need to construct a graph with similar regions having more weights between them. For this we supply the "similarity" option to rag_mean_color.

rag = graph.rag_mean_color(demo_image, labels, mode="similarity")

for region in regions:
    rag.node[region['label']]['centroid'] = region['centroid']

label_rgb = display_edges(label_rgb, rag)


If you notice above the black and white regions have red edges between them, i.e. they are very similar. However the edges between the black and white regions are green, indicating they are less similar.

Problems with the min cut

Consider the following graph


The minimum cut approach would partition the graph as {A, B, C, D} and {E}. It has a tendency to separate out small isolated regions of the graph. This is undesirable for image segmentation as this would separate out small, relatively disconnected regions of the image. A more reasonable partition would be {A, C} and {B, D, E}. To counter this aspect of the minimum cut, we used the Normalized Cut.

The Normalized Cut

It is defined as follows
Let V be the set of all nodes and w(u,v) for u, v \in V be the edge weight between u and v

NCut(A,B) = \frac{cut(A,B)}{Assoc(A,V)} + \frac{cut(A,B)}{Assoc(B,V)}
cut(A,B) = \sum_{a \in A ,b \in B}{w(a,b)}

Assoc(X,V) = cut(X,V) = \sum_{x \in X ,v \in V}{w(x,v)}

With the above equation, NCut won’t be low is any of A or B is not well-connected with the rest of the graph. Consider the same graph as the last one.


We can see that minimizing the NCut gives us the expected partition, that is, {A, C} and {B, D, E}.

Normalized Cuts for Image Segmentation

The idea of using Normalized Cut for segmenting images was first suggested by Jianbo Shi and Jitendra Malik in their paper Normalized Cuts and Image Segmentation. Instead of pixels, we are considering RAGs as nodes.

The problem of finding NCut is NP-Complete. Appendix A of the paper has a proof for it. It is made tractable by an approximation explained in Section 2.1 of the paper. The function _ncut_relabel is responsible for actually carrying out the NCut. It divides the graph into two parts, such that the NCut is minimized. Then for each of the two parts, it recursively carries out the same procedure until the NCut is unstable, i.e. it evaluates to a value greater than the specified threshold. Here is a small snippet to illustrate.

img = data.coffee()

labels1 = segmentation.slic(img, compactness=30, n_segments=400)
out1 = color.label2rgb(labels1, img, kind='avg')

g = graph.rag_mean_color(img, labels1, mode='similarity')
labels2 = graph.cut_normalized(labels1, g)
out2 = color.label2rgb(labels2, img, kind='avg')



NCut in Action

To observe how the NCut works, I wrote a small hack. This shows us the regions as divides by the method at every stage of recursion. The code relies on a modification in the original code, which can be seen here.

from skimage import graph, data, io, segmentation, color
from matplotlib import pyplot as plt
import os

#img = data.coffee()
os.system('rm *.png')
img = data.coffee()
#img = color.gray2rgb(img)

labels1 = segmentation.slic(img, compactness=30, n_segments=400)
out1 = color.label2rgb(labels1, img, kind='avg')

g = graph.rag_mean_color(img, labels1, mode='similarity')
labels2 = graph.cut_normalized(labels1, g)

offset = 1000
count = 1
tmp_labels = labels1.copy()
for g1,g2 in graph.graph_cut.sub_graph_list:
    for n,d in g1.nodes_iter(data=True):
        for l in d['labels']:
            tmp_labels[labels1 == l] = offset
    offset += 1
    for n,d in g2.nodes_iter(data=True):
        for l in d['labels']:
            tmp_labels[labels1 == l] = offset
    offset += 1        
    tmp_img = color.label2rgb(tmp_labels, img, kind='avg')
    io.imsave(str(count) + '.png',tmp_img)
    count += 1

The two components at each stage are stored in the form of tuples in sub_graph_list. Let’s say, the Graph was divided into A and B initially, and later A was divided into A1 and A2. The first iteration of the loop will label A and B. The second iteration will label A1, A2 and B, and so on. I used the PNGs saved and converted them into a video with avconv using the command avconv -f image2 -r 1 -i %d.png -r 20 demo.webm. GIFs would result in a loss of color, so I made webm videos. Below are a few images and their respective successive NCuts. Use Full Screen for better viewing.

Note that although there is a user supplied threshold, it does not have to vary significantly. For all the demos below, the default value is used.

Colors Image


During each iteration, one region (area of the image with the same color) is split into two. A region is represented by its average color. Here’s what happens in the video

  • The image is divided into red, and the rest of the regions (gray at this point)
  • The grey is divided into a dark pink region (pink, maroon and yellow) and a
    dark green ( cyan, green and blue region ).
  • The dark green region is divided into light blue ( cyan and blue ) and the
    green region.
  • The light blue region is divided into cyan and blue
  • The dark pink region is divided into yellow and a darker pink (pink and marron
  • The darker pink region is divided into pink and maroon regions.

Coins Image



Camera Image



Coffee Image



Fruits Image

apples group fruit vegetable isolated on white


Baby Image ( Scaled )



Car Image


Graph based Image Segmentation

My GSoC project this year is Graph based segmentation algorithms using region adjacency graphs. Community binding period is coming to an end. I have experimented a bit with Region Adjacency Graphs (RAGs) and Minimum Spanning Trees (MSTs) with this ugly piece of Python code.  I will try to describe in brief what I plan to do during this GSoC period.


Region Adjacency Graphs

Certain image segmentation algorithms have a tendency to over segment an image. They divide a region as perceived by humans into two or more regions. This is because they tend to favor small regions of similar color. But in the real world one object might have different shades of the same color or different colors all together. Here is an example using SLIC. In broad terms SLIC is k-means done on (X,Y, Z ) color space


Lena and her SLIC

We consider each of these regions as a vertex in a graph. Each region is connected to all the regions that touch it. Similar regions are joined with an edge of less weight. Dissimilar regions are joined with edges oh high weight. One measure of dissimilarity might be difference in the mean color. See the below example.

RAG for Lena


Processing The Region Adjacency Graphs

If we remove the edges with higher weights in an appropriate manner, the regions remaining connected would belong to the same object. Thus in this case the face, the hat, the hair might be finally one connected subgraph of regions. Over the next two weeks I will try to take an over segmented image and build its RAG. As a proof of concept of the underlying data structures and algorithms I will apply a threshold and remove the edges with weights higher than it. Later on I will move onto to more complicated selection procedures including N-cut and if my MST experiments yield good results an MST based procedure.


Batman vs Superman – Image Classification

Batman vs Superman is yet to be out. With Ben Affleck, Gal Gadot etc. already cast in it. In my opinion, when Batman vs Superman happens I want Batman to kick Superman’s ass.

Meanwhile, I just thought about classifying images of Batman and Superman. It turned out to be really simple, at least for the images I considered. It’s my first real attempt at a Computer Vision problem. Checkout  the IPtyhon Notebook here.

Batman vs Superman – IPython Notebook






The 3 wheeled Wonder

Back in 2011 during our second year of Engineering, we( me , Sanket and Dhruv ) had ventured out to build our first robot together as a team. To tell a bit about us, me and Sanket hadn’t even touched so much as a line-follower before. Automata was an event in which we had to manuever the Robot via Image Processing by an overhead camera through a hexagonal grid. The path was pre determined and was given to us as a text file. None of us had the faintest idea of Image Processing and I ended up writing this horrible piece of Matlab code that I haven’t seen since then because I’m too ashamed of it. But never the less, it worked on the day of the competition. But more than the code, we owe our victory to the unique design of our robot. I still remember how overjoyed we were when for the first time the robot took a turn on its own ( guided by the computer ). The eventual result of the contest was beyond our wildest dreams.

The Robot

I have roughly sketched up the top view of the robot here.
Robot Sketch

The omni-wheels used are similar to the one in this picture


Since the grid was hexagonal the three omni-wheels provided a unique advantage. The robot could navigate in 6 directions without rotating about itself. For instance, by keeping wheel A fixed and rotating B and C together, the robot could movie in direction 1 or 4.

Thus, we could complete the assigned task, without turning the robot. This saved time, and also eliminated the Image Processing needed to compute the orientation. All we needed to do, was turn the right wheels and the robot moved in one of 6 directions.

So on the whole, the computer saw the grid, read the sequence from the text file, and instructed the robot accordingly to do the needful ( go, stop or change direction ) via Bluetooth.We were banned from relying on position encoders, so the Computer had to decide, by processing images, when the robot had arrived at the center of one hexagon and had to move to the next. The result as you will see, is in the following video.

Solving Mazes , Like a Boss !

NITK Automata 2012 Winning Run

Me and 2 of my friends , Dhruv and Sanket participated at NIT Surathkal in a contest called Automata. The task was to get the robot to traverse the maze. A Laptop was connected to an overhead camera . The laptop solved the maze, plotted a virtual path and directed the robot successfully to the center.

How we see the maze


How  the computer saw the maze


The three yellow dots are the center of the maze and the two colored blobs on the robot. The green dots represent the path that the robot is supposed to follow.


Things we used

Python, OpenCV for Python and wxWidgets.The computer directed the robot over Bluetooth, communicating through pySerial.