End of the Hunt

After many months of searching and answering interviews, my job hunt has finally come to an end. With it, comes some much needed peace of mind and relaxation. A process which started almost as soon as my first semester began (1.5 years ago) is finally over. This post describes my job search and interview experiences.

Hunting grounds

From my observations, the best chance you have at hearing back is when you apply through campus career fairs. The fact that some organization has taken the time out to specifically come to your university already means that they are looking to hire the students there. Other good places to potentially look for are tech meetups and conferences. Tech meetups have been a hit or miss for me, like this one time I landed up in an event where everybody was a web developer (not what I was looking for). In general, any chance you have of talking to potential employers is much better than sending over your resume (which usually gets read through in less than a minute). On the other hand, although the conversion rate of applying online is really low, it can be done sitting at home and you can cover a lot more companies doing that.


Needless to say that your resume is very important. Apart from the accolades and projects you put in it, I have also paid attention to aesthetics. Another important point to keep in mind is to strike a good balance between technical details and appeal to the general public, because your resume is read by both, experts in the field as well as people from the HR department. Having a webpage helps, specially if you are a programmer. There have been numerous times where potential employers have asked me about links and videos I have on mine. A Github profile with cool projects is also something employers appreciate.


Companies start hiring for internship positions for the summer as soon as the Fall semester starts. An internship tremendously increases your chances of getting a full time offer at the same organization. Among the computer science community, some organizations like Google, Facebook, Microsoft, Amazon etc. are highly preferred. With that said their interview process is long and difficult to crack. From my personal experiences, apart from the tech giants, interning at a startup has more benefits. The first being that you learn a lot, and the second being that future full time employers will be really interested in the work you did, mainly because in a smaller company interns have a lot more responsibility, and more often than not, problems you will encounter would have not been faced by anyone in the company before. If you are applying for research positions, collaborations with Professors at the University are extremely helpful, specially if they are in the same field of study as a particular organization is working in.I interned at a startup called Deep-Magic, where I designed and deployed deep neural networks for image classification. In my last Spring semester I am also interning at ButterflyNetwork as a Deep Learning intern, where I am mainly designing deep neural networks to draw inferences from ultra sound images.


Interviews at the tech giants will almost always involve an algorithmic challenge. You will be expected to come up with a solution on a black board or a shared document. The two most helpful resources I found in this case were Cracking the Coding Interview and Leetcode (they have a neat feature to sort problems by increasing difficulty). Startups tend to generally ask domain specific questions, which I really enjoyed. For someone who is interested in machine learning like me, knowing fundamental machine learning algorithms is a necessity to do well in interviews. For example, why are SVMs called “Support Vector” machines is a question I have been asked more than once.

Maintaining enthusiasm

For me the process took longer than I thought it would. I lost count of how many companies I applied to, most of which never got back. I have often had to go through long periods without hearing back from anyone. And I heard a lot more Noes than yeses. One thing I made sure of is to ask all my potential employers the reason for their rejection, and whenever possible, took steps to make the necessary improvements in my profile or skills.


In the end I managed to receive 4 nice offers. I have chosen the offer from FeatureX where I will be working as a Research Scientist. Since FeatureX is located at Boston, I feel this video is relevant

NYU Fall Semester Recap

Christmas is almost here and I just finished my fall semester at NYU in the pursuit of my Master’s degree. A lot about my life changed in the past few months. The location, the learning experience, my peers, the climate, it was all new to me. It is the first time I traveled anywhere outside my home country, India. It is also the first time in my life that I will be experiencing snowfall (hopefully sometime soon).

Life in NYC

New York City. The reputation of this city definitely precedes it. A city so vastly different from the town I grew up in, yet so eerily familiar. I have seen these streets so many times in popular media that the roads almost seem familiar, yet I find myself getting lost sometimes. The city is fast and glamorous. Early in the chilly mornings you can see a lot of people hurrying about with large cups of coffee, but almost everyone seems busy and determined. In fact, on many occasions, I am the most laid-back person in the crowd (maybe because I hail from Goa). I read a lot about New York City being crowded, but compared to any decent town in India, that’s really not a lot.

But a thing about NYC that is almost always true is that it is expensive, even compared to most other places in the USA. College fees, food, rent, clothing and everything else. Although I rented an apartment in Jersey City, that is still costlier than other places in the country. One thing I learnt quickly was to stop converting everything I purchase to INR.

Finding an apartment was a big hurdle we (my roommates) were all worried about. It took us about 2 weeks to get one. I would like to point out 2 important things I learnt during the process.

  1. Either get a guarantor with credit history to sign the lease, or be ready to pay advance rent to compensate for not having credit history.

  2. Do not contact multiple brokers for the same place. We did that and ended up getting blacklisted with an agency.

In the end we did manage to get a nice place. Here is the view from our window.

NYC is a gold mine for those who are travel hungry. Although I myself haven’t explored much of it. That’s partly because of my course load this semester and partly due to the fact that I have been a little lazy. I did manage to see the Freedom Tower. That building was seriously too tall for me to get my head wrapped around it. And just outside the college campus one day, I encountered what is probably the shooting of the new Ghostbusters movie.


NYU Itself

I was admitted into the MS in Computer Science program at NYU. The initial admission process is a bit tedious, including the the application, registration, visa, SSN, etc., but eventually everything fell into place. The NYU staff is extremely helpful, and everyone I met was more than happy to assist me. Most of my time this semester was either spend doing work or assignments. I will talk about both of them.

Working as a Junior Data Scientist

Even before my classes started, I managed to get an on campus job as a Junior Data Scientist at NYU’s Center for Data Science. This job was everything I was wishing for and more. Most of my time was spent writing patches for [scikit-learn] while being supervised by Andreas Müller. I was programming in Python/Cython for what is arguably the most popular Machine Learning library around. For a person like me, there isn’t a cooler on campus job around.

The Courses

I took the standard course load of 3 courses. The course work was a little more than I anticipated, but everyone gets used to it pretty soon and the semester just whizzes by. I’ll try to briefly summarize the most important things I learnt from each course.

Fundamental Algorithms

I guess this is the single most important subject for a computer scientist overall. I learnt 2 very important things.

  1. Recursion
    The technique of obtaining solutions solving the same problem on smaller instances. I think the most difficult aspect about recursion is making the assumption that smaller instances of the problem are solved. In our effort to obtain a solution, we almost always think of a bottom up approach. But if you can fool yourself to think, for a short time, that a smaller subset of the problem is solved, you can get beautiful solutions to some complex problems.

  2. Correctness
    If you cannot prove it, it’s probably not correct.

Programming Languages

We learnt 5 languages : Ada, Scheme, ML, Java and Scala. This was the first time I saw functional programming languages. Initially the idea of not having variables seemed discomforting, but all the functional code I wrote eventually turned out to be short, clean and recursive. I can see why some people like functional languages, although I am still not convinced it’s the solution to all of the world’s problems. I was also very impressed by ML’s type inference, I never knew it could be done so well. I am convinced that if I ever write a programming language of my own, it will be type inferred.

Computer Vision

This was the most fun course for me. Although I still have a lot to learn in this area, I have learnt that certain tough problems are easily understood by doing appropriate transformations. I also managed to collaborate with my professor to solve the problem of detecting mirror symmetric objects. Our submission is under scrutiny for CVPR 2016.

So semester 1 is complete and I never realized how it went by. It’s Christmas time and I look forward to my much awaited vacation. After this short cozy break, another intense semester awaits.

A thing or two about the Fourier Transform

Recently I have been reading up on frequency domain image processing. I am still just beginning to understand how it works. Over the last few weeks I have been trying to understand the ** Fourier Transform **. Although the gist of  Fourier Series is easy to understand from its formula, that of the Fourier Transform isn’t (to me at least). This blog post describes the Fourier Transform as I understood it, along with some helpful diagrams. I am using NumPy and matplotlib for plotting. As a reference for all the equations mentioned in this post, I am using Digital Image Processing (3rd Edition).

The Fourier Series

I haven’t done any digging into its theory, but the formula for Fourier Series is pretty straight forward. Given f has a period of T

f(t) = \sum\limits_{n=-\infty}^\infty c_{n} e^{i\frac{2 \pi n}{T}}


c_n = \frac{1}{T} \int\limits_{\frac{-T}{2}}^{\frac{T}{2}} f(t)e^{-i\frac{2 \pi n}{T}}

n = 0,\pm 1, \pm2, .....

To simplify this, we can replace \omega=\frac{2\pi}{T} . f now becomes.

f(t) = \sum\limits_{n=-\infty}^\infty  c_n e^{i \omega n}

Using Euler’s Formula

f(t) = \sum\limits_{n=-\infty}^\infty  c_n (cos(n \omega t) + i. sin(n \omega t))

We can say that, f is now exactly equal to an infinite sum of certain sines and cosines. The Wikipedia Page has some pretty nice visualizations of the concept.

The Discrete Fourier Transform

This is what took me a while to understand. The discrete fourier transform, is derived from the Fourier Transform. Given a discrete function f whose M samples have been collected, we have.

F(u) = \sum\limits_{x=0}^{M-1} f(x)e^{-i\frac{2\pi u x}{M}}


u = 0, 1, 2, ..., M -1

As you can see, the function f of x is transformed into the function F of u .
We can get f back by the inverse Fourier Transform

f(x) = \frac{1}{M} \sum\limits_{u=0}^{M-1} F(u)e^{i\frac{2\pi u x}{M}}
x = 0, 1, 2, ... , M -1

Although the relationship of f and F is mathematically proven, the underlying meaning was not immediately apparent to me. What puzzled me was, why F is computed at all, and how can it tell us anything about f . The way I looked at it to understand it better is as follows.

\omega = \frac{2\pi}{M}

f(x) = \frac{1}{M} \sum\limits_{u=0}^{M-1} F(u)e^{i\omega u x}

Using Euler’s formula we have

f(x) = \frac{1}{M}[{F(0).(cos(0) + i.sin(0)) + F(1).(cos(1\omega x) + i.sin(1\omega x)) + F(2).(cos(2\omega x) + i.sin(2\omega x)) + ...} ]

From the above equation, it is clear why the Fourier Transform is important. The function F , has given us coefficients to express f as a sum of sines and cosines. I will try to demonstrate this with some code. I wrote a small helper function to help me input curves. The following function taken an image of a curve drawn in black on white and saves it into a .npy file. You will need the scikit-image library to use this.

from skimage import io, color, util
from matplotlib import pyplot as plt

pylab.rcParams['figure.figsize'] = (8.0, 8.0)

def img2curve(imgfile, arrayfile):
    """ Converts an image of a curve into numpy array and saves it"""

    img = io.imread(imgfile)
    img = color.rgb2gray(img)
    img = util.img_as_ubyte(img)
    array = np.zeros((img.shape[1], ))

    for r in range(img.shape[0]):
        for c in range(img.shape[1]):
            if img[r, c] < 128:
                array[c] = img.shape[0] - r
    np.save(arrayfile, array)

def show_curve(array):
    """ Plots real and imaginary parts of an array """

    ax = plt.subplot(111)
    ax.set_xlim(0, len(array))
    ax.set_ylim(-10, len(array))
    plt.plot(np.real(array), color='blue')
    plt.plot(np.imag(array), color='red')

Original Image

img = io.imread("curve.png")


Derived Curve

img2curve("curve.png", "curve.npy")
curve = np.load("curve.npy")


If you’d like to, you can download the original image and the curve.npy file.

Computing the DFT

We will compute the discrete fourier transform using NumPy’s np.fft.fft method and observe the first 5 terms.

curve_fft = np.fft.fft(curve)
print curve_fft[:5]

curve_fft is our function F as described above. Applying the inverse transform to curve_fft should give us back curve exactly. But let’s see what happens by considering an element in F at a time. First we take only F(0) and apply the inverse transform by using np.fft.ifft.

Contribution of F(0)

t_0(x) = \frac{1}{M}[F(0).(cos(0) + i.sin(0))]

tmp = curve_fft.copy()
tmp[1:] = 0

array = np.fft.ifft(tmp)


As you can see, it is a real constant shown in blue.

Contribution of F(1)

t_1(x) = \frac{1}{M}[F(1).(cos(1\omega x) + i.sin(1\omega x))]

tmp = curve_fft.copy()
tmp[2:] = 0
tmp[0:1] = 0

array = np.fft.ifft(tmp)


Contribution of F(2)

t_2(x) = \frac{1}{M}[F(2).(cos(2\omega x) + i.sin(2\omega x))]

tmp = curve_fft.copy()
tmp[3:] = 0
tmp[0:2] = 0

array = np.fft.ifft(tmp)


Contribution of F(0), F(1) and F(2)

t_0(x) + t_1(x) + t_2(x)

tmp = curve_fft.copy()
tmp[3:] = 0

array = np.fft.ifft(tmp)


As you can see the resultant curve is trying to match the original curve. The imaginary part in red, will eventually get zeroed out when all the terms of F are considered.

Fourier Transform in action

The code snippet below generates successive images of contribution of each term in F . In nth still, the blue and red lines show the real and imaginary part of the curve so far, and the dotted lines show the contribution of F(n) .

import numpy as np
from matplotlib import pyplot as plt

curve = np.load("curve.npy")
curve_fft = np.fft.fft(curve)

result = np.zeros_like(curve, dtype=np.complex)

for i in range(curve_fft.shape[0]):

    tmp = np.zeros_like(curve, dtype=np.complex)
    tmp[i] = curve_fft[i]

    tmp_curve = np.fft.ifft(tmp)
    result += tmp_curve

    fig = plt.figure()
    ax = plt.subplot(111)
    ax.set_xlim(0, len(curve))
    ax.set_ylim(-10, len(curve))

    plt.plot(np.real(result), color='blue')
    plt.plot(np.imag(result), color='red')

    plt.title("Adding F(%d)" % i)
    plt.plot(np.real(tmp_curve), 'r--', color='blue')
    plt.plot(np.imag(tmp_curve), 'r--', color='red')

    fig.savefig('out/' + str(i) + ".png")

I ran the command

avconv -f image2 -r 1 -i %d.png -r 20 fft.mp4

to generate the final video.

As I understand DFT more, I’ll keep posting. Till then, thanks for reading.

Noob with a Camera

I recently purchased the Panasonic DMC-FZ70. It’s a semi-DSLR camera with a non-detachable lens capable of some insane zooming. I mainly plan to use it to learn the basics of photography and see if I can get the hang of things.

The Basics

As I fiddled around with the options I discovered that 4 main attributes of the camera governed how the image appeared. The Focus distance, Aperture, Shutter Speed and ISO Sensitivity. In this post I will try to describe how I have understood these parameters for those impatient ones who have recently bought their DSLRs. I have omitted the Focus distance parameter for now as it needs a more elaborate post of its own, maybe for a later time.

Getting Started

For all the photographs I put my camera into the Manual Exposure mode, indicated by an M on its dial, along with Manual Focusing indicated by MF. This allowed me to tune all the 4 parameters mentioned above.

Noob Disclaimer

All the photographs attached in the post were taken by me. They are by no means optimal and an expert might have been able to produce much better photographs under identical conditions. The post only serves as an experiment to demonstrate the camera’s parameters.

The Aperture

Aperture describes the size of the hole which lets light into the camera sensor. The wider this opening, the more light will be let in, and the brighter the image will appear. The aperture is usually described by its f-Number, such as f/2.8, f/5.6 or f/8. This diagram (grabbed from Wikipedia) gives a nice representation of how the f-Number maps to the actual aperture of the camera.

Please do visit the Wikipedia Page for a detailed explanation of the f-Number. A higher f-Number means that the aperture is opened by a lesser amount, and thus lesser light is let in which eventually makes the image dimmer. For example, with all other parameters kept the same, a f/2.8 photo will be brighter than a f/8 photo.

The following 3 photos of Hibiscus flower in my garden, serve to illustrate the effect of Aperture. In the photos, the Aperture is indicated by the A abbreviation in the caption.

Effect of Aperture




It’s clear from the above images that the lowest f-Number produces the brightest and in my opinion the best photo.

Shutter Speed

Shutter Speed is the amount of time the shutter of the camera stays open while capturing the image. Unlike the Aperture, which is indicated by an indirect measurement, the shutter speed is described by its exact duration, such as 1/10s (One tenth of a second or 100ms) or 1/500s (One Five-Hundredth of a second or 2ms). One obvious effect of the shutter speed is on the brightness. A faster shutter speed means less time for the light to form an image on the camera sensor and eventually a darker image.

Shutter Speed comes into play when the subject being photographed is moving. For example, if we are photographing a bird in motion with a shutter speed of 1/2s. Within the 1/2 second the image is captured, if the bird flaps its wings, the wings will leave a blurred artifact on the photograph. Generally speaking, for a crisp image, the shutter speed needs to be quicker than the time in which the subject can move significantly.

Effect of Shutter Speed

Below, I try to illustrate the effect of the shutter speed on the photographs of a spinning coin. In the captions, the shutter speed is abbreviated as SS.




As observed, the spinning coin is simply too fast for the shutter speed of 1/10s. As we increase the shutter speed, the coin becomes less blurred, and finally, it is the most crisp as the shutter speed of 1/500s (2ms). Increasing the shutter speed also lets in less light. That’s the reason why I have compensated for it in the last image by increasing the ISO sensitivity.

ISO Sensitivity

The ISO Sensitivity governs how sensitive the camera’s image sensor is to light. A higher ISO Sensitivity amplifies the electrical signals produced by image sensor (this maps to the actual electrical gain in its transistors). In case you are wondering what ISO stands for, it is an abbreviation for International Organization for Standardization. Among other things, the organization has also standardized the sensitivity of camera image sensors. Hence the name ISO Sensitivity or simply the ISO. Increased ISO means a more bright image, keeping the other parameters constant. A high ISO is often used for indoor or low-light photography.

Below, I have tried to capture a Wooden Toy, which was made in my native town of Sawantwadi. The photos were taken indoors with the room illuminated only by a fluorescent light. The ISO Sensitivity is abbreviated by ISO in the captions.

Effect of ISO Sensitivity




As you can see, increasing the ISO produces a successively brighter image, with the highest ISO of 3200 producing the best image.


With the week I spent exploring my camera, I’ve learnt that photography as an art, requires not only sound technical judgment, but also an eye for creativity. Feel free to point out mistakes our potential improvements in the post. The piece of code I used to generate the caption can be found on this gist. All the original photographs are uploaded here, where you can see a detailed description containing all the camera parameters involved.

GSoC 2014 – Signing off

This years GSoC coding period has nearly come to an end. This post aims to briefly summarize everything that happened during the last three months. My task was to implement Region Adjacency Graph based segmentation algorithms for scikit-image. This post provides a good explanation about them. Below I will list out my major contributions.


Region Adjacency Graphs

Fixing the API for RAGs was very important, since it was directly going to affect everything else that followed. After a long discussion and some benchmarks we finally decided to have NetworkX as a dependency. This helped a lot, since I had a lot of graph algorithms already implemented for me. The file rag.py implements the RAG class and the RAG construction methods. I also implemented threshold_cut, a function which segments images by simply thresholding edge weights. To know more, you can visit, RAG Introduction.

Normalized Cut

The function cut_normazlied, implements the Normalized Cut algorithm for RAGs. You can visit Normalized Cut on RAGs to know more. See the videos at the end to get a quick idea of how NCut works. Also see, A closer look at NCut, where I have benchmarked the function and indicated bottlenecks.

Drawing Regions Adjacency Graphs

In my posts, I had been using a small piece of code I had written to display RAGs. This Pull Request implements the same functionality for scikit-image. This would be immensely useful for anyone who is experimenting with RAGs. For a more detailed explanation, check out Drawing RAGs.

Hierarchical Merging of Region Adjacency Graphs

This Pull Request implements a simple form of Hierarchical Merging. For more details, see Hierarchical Merging of Region Adjacency Graphs. This post also contains videos at the end, do check them out. This can also be easily extended to a boundary map based approach, which I plan to do post-GSoC


Final Comments

The most important thing for me is that I am a better Python programmer as compared to what I was before GSoC began this year. I was able to see how some graph based segmentation methods work at their most basic level. Although GSoC has come to an end, I don’t think my contributions to scikit-image have. Contributing to it has been a tremendous learning experience and plan to continue doing so. I have been been fascinated with Image Processing since me and my friends wrote an unholy piece of Matlab code about 3 years ago to achieve this. And as far as I can see its a fascination I will have for the rest of my life.

Finally, I would like to thank my mentors Juan, Johannes Schönberger and Guillaume Gay. I would also like to thank Stefan for reviewing my Pull Requests.



Hierarchical Merging of Region Adjacency Graphs

Region Adjacency Graphs model regions in an image as nodes of a graph with edges between adjacent regions. Superpixel methods tend to over segment images, ie, divide into more regions than necessary. Performing a Normalized Cut and Thresholding Edge Weights are two ways of extracting a better segmentation out of this. What if we could combine two small regions into a bigger one ? If we keep combining small similar regions into bigger ones, we will end up with bigger regions which are significantly different from its adjacent ones. Hierarchical Merging explores this possibility. The current working code can be found at this Pull Request

Code Example

The merge_hierarchical function performs hierarchical merging on a RAG. It picks up the smallest weighing edge and combines the regions connected by it. The new region is adjacent to all previous neighbors of the two combined regions. The weights are updated accordingly. It continues doing so till the minimum edge weight in the graph in more than the supplied thresh value. The function takes a RAG as input where smaller edge weight imply similar regions. Therefore, we use the rag_mean_color function with the default "distance" mode for RAG construction. Here is a minimal code snippet.

from skimage import graph, data, io, segmentation, color

img = data.coffee()
labels = segmentation.slic(img, compactness=30, n_segments=400)
g = graph.rag_mean_color(img, labels)
labels2 = graph.merge_hierarchical(labels, g, 40)
g2 = graph.rag_mean_color(img, labels2)

out = color.label2rgb(labels2, img, kind='avg')
out = segmentation.mark_boundaries(out, labels2, (0, 0, 0))

I arrived at the threshold 40 after some trial and error. Here is the output.


The drawback here is that the thresh argument can vary significantly depending on image to image.

Comparison with Normalized Cut

Loosely speaking the normalized cut follows a top-down approach where as the hierarchical merging follow a bottom-up approach. Normalized Cut starts with the graph as a whole and breaks it down into smaller parts. On the other hand hierarchical merging, starts with individual regions and merges them into bigger ones till a criteria is reached. The Normalized Cut however, is much more robust and requires little tuning of its parameters as images change. Hierarchical merging is a lot faster, even though most of its computation logic is written in Python.

Effect of change in threshold

Setting a very low threshold, will not merge any regions and will give us back the original image. A very large threshold on the other hand would merge all regions and give return the image as just one big blob. The effect is illustrated below.











Hierarchical Merging in Action

With this modification the following code can output the effect of all the intermediate segmentation during each iteration.

from skimage import graph, data, io, segmentation, color
import time
from matplotlib import pyplot as plt

img = data.coffee()
labels = segmentation.slic(img, compactness=30, n_segments=400)
g = graph.rag_mean_color(img, labels)
labels2 = graph.merge_hierarchical(labels, g, 60)

c = 0

out = color.label2rgb(graph.graph_merge.seg_list[-10], img, kind='avg')
for label in graph.graph_merge.seg_list:
    out = color.label2rgb(label, img, kind='avg')
    out = segmentation.mark_boundaries(out, label, (0, 0, 0))
    io.imsave('/home/vighnesh/Desktop/agg/' + str(c) + '.png', out)
    c += 1

I then used avconv -f image2 -r 3 -i %d.png -r 20 car.mp4 to output a video. Below are a few examples.

In each of these videos, at every frame, a boundary dissapears. This means that the two regions separated by that boundary are merged. The frame rate is 5 FPS, so more than one region might be merged at a time.

Coffee Image


Car Image


Baseball Image


A closert look at Normalized Cut

Variation with number of regions

In this post I explained how the Normalized Cut works and demonstrated some examples of it. This post aims to take a closer look at the code. I ran the following code to monitor the time taken by NCut with respect to initial number of regions.

from __future__ import print_function
from skimage import graph, data, io, segmentation, color
import time
from matplotlib import pyplot as plt

image = data.coffee()
segment_list = range(50, 801, 50)

for nseg in segment_list:
    labels = segmentation.slic(image, compactness=30, n_segments=nseg)
    rag = graph.rag_mean_color(image, labels, mode='similarity')
    T = time.time()
    new_labels = graph.ncut(labels, rag)
    time_taken = time.time() - T
    out = color.label2rgb(new_labels, image, kind='avg')
    io.imsave('/home/vighnesh/Desktop/ncut/' + str(nseg) + '.png', out)
    print(nseg, time_taken)


Here is the output sequentially.

By a little guess-work, I figured that the curve approximately varies as x^2.2. For 800 nodes, the time taken is around 35 seconds.

Line Profile

I used line profiler to examine the time taken by each line of code in threshold_normalized. Here are the results.

   218                                           @profile
   219                                           def _ncut_relabel(rag, thresh, num_cuts):
   220                                               """Perform Normalized Graph cut on the Region Adjacency Graph.
   222                                               Recursively partition the graph into 2, until further subdivision
   223                                               yields a cut greather than `thresh` or such a cut cannot be computed.
   224                                               For such a subgraph, indices to labels of all its nodes map to a single
   225                                               unique value.
   227                                               Parameters
   228                                               ----------
   229                                               labels : ndarray
   230                                                   The array of labels.
   231                                               rag : RAG
   232                                                   The region adjacency graph.
   233                                               thresh : float
   234                                                   The threshold. A subgraph won't be further subdivided if the
   235                                                   value of the N-cut exceeds `thresh`.
   236                                               num_cuts : int
   237                                                   The number or N-cuts to perform before determining the optimal one.
   238                                               map_array : array
   239                                                   The array which maps old labels to new ones. This is modified inside
   240                                                   the function.
   241                                               """
   242        59       218937   3710.8      3.2      d, w = _ncut.DW_matrices(rag)
   243        59          151      2.6      0.0      m = w.shape[0]
   245        59           61      1.0      0.0      if m > 2:
   246        44         3905     88.8      0.1          d2 = d.copy()
   247                                                   # Since d is diagonal, we can directly operate on its data
   248                                                   # the inverse of the square root
   249        44          471     10.7      0.0          d2.data = np.reciprocal(np.sqrt(d2.data, out=d2.data), out=d2.data)
   251                                                   # Refer Shi & Malik 2001, Equation 7, Page 891
   252        44        26997    613.6      0.4          vals, vectors = linalg.eigsh(d2 * (d - w) * d2, which='SM',
   253        44      6577542 149489.6     94.9                                       k=min(100, m - 2))
   255                                                   # Pick second smallest eigenvector.
   256                                                   # Refer Shi & Malik 2001, Section 3.2.3, Page 893
   257        44          618     14.0      0.0          vals, vectors = np.real(vals), np.real(vectors)
   258        44          833     18.9      0.0          index2 = _ncut_cy.argmin2(vals)
   259        44         2408     54.7      0.0          ev = _ncut.normalize(vectors[:, index2])
   261        44        22737    516.8      0.3          cut_mask, mcut = get_min_ncut(ev, d, w, num_cuts)
   262        44           78      1.8      0.0          if (mcut < thresh):
   263                                                       # Sub divide and perform N-cut again
   264                                                       # Refer Shi & Malik 2001, Section 3.2.5, Page 893
   265        29        78228   2697.5      1.1              sub1, sub2 = partition_by_cut(cut_mask, rag)
   267        29          175      6.0      0.0              _ncut_relabel(sub1, thresh, num_cuts)
   268        29           92      3.2      0.0              _ncut_relabel(sub2, thresh, num_cuts)
   269        29           32      1.1      0.0              return
   271                                               # The N-cut wasn't small enough, or could not be computed.
   272                                               # The remaining graph is a region.
   273                                               # Assign `ncut label` by picking any label from the existing nodes, since
   274                                               # `labels` are unique, `new_label` is also unique.
   275        30          685     22.8      0.0      _label_all(rag, 'ncut label')

As you can see above 95% of the time is taken by the call to eigsh.

To take a closer look at it, I plotted time while ensuring only one iteration. This commit here takes care of it. Also, I changed the eigsh call to look for the largest eigenvectors instead of the smallest ones, with this commit here. Here are the results.


A single eignenvalue computation is bounded by O(n^1.5) as mentioned in the original paper. The recursive NCuts are pushing the time required towards more than O(n^2).eigsh solves the eigenvalue problem for a symmetric hermitian matrix. It in turn relies on a library called ARPack. As documented here ARPack isn’t very good at finding the smallest eigenvectors. If the value supplied as the argument k is too small, we get the ArpackNoConvergence Exception. As seen from the above plot, finding the largest eigenvectors is much more efficient using the eigsh function.

Since the problem is specific to ARPack, using other libraries might lead to faster computation. slepc4py is one such BSD licensed library. The possibility of optionally importing slec4py should be certainly explored in the near future.

Also, we can optionally ask the user for a function to solve the eigenvalue problem, so that he can use a matrix library of his choice if he/she so desires.

Final Thoughts

Although the current Normalized Cut implementation takes more than quadratic time, the preceding over segmentation method does most of the heavy lifting. With something like SLIC, we can be sure of the number of nodes irrespective of the input image size. Although, a better eigenvalue finding technique for smallest eigenvectors would immensely improve its performance.