A closert look at Normalized Cut

Variation with number of regions

In this post I explained how the Normalized Cut works and demonstrated some examples of it. This post aims to take a closer look at the code. I ran the following code to monitor the time taken by NCut with respect to initial number of regions.

from __future__ import print_function
from skimage import graph, data, io, segmentation, color
import time
from matplotlib import pyplot as plt

image = data.coffee()
segment_list = range(50, 801, 50)

for nseg in segment_list:
    labels = segmentation.slic(image, compactness=30, n_segments=nseg)
    rag = graph.rag_mean_color(image, labels, mode='similarity')
    T = time.time()
    new_labels = graph.ncut(labels, rag)
    time_taken = time.time() - T
    out = color.label2rgb(new_labels, image, kind='avg')
    io.imsave('/home/vighnesh/Desktop/ncut/' + str(nseg) + '.png', out)
    print(nseg, time_taken)

graph

Here is the output sequentially.

By a little guess-work, I figured that the curve approximately varies as x^2.2. For 800 nodes, the time taken is around 35 seconds.

Line Profile

I used line profiler to examine the time taken by each line of code in threshold_normalized. Here are the results.

   218                                           @profile
   219                                           def _ncut_relabel(rag, thresh, num_cuts):
   220                                               """Perform Normalized Graph cut on the Region Adjacency Graph.
   221                                           
   222                                               Recursively partition the graph into 2, until further subdivision
   223                                               yields a cut greather than `thresh` or such a cut cannot be computed.
   224                                               For such a subgraph, indices to labels of all its nodes map to a single
   225                                               unique value.
   226                                           
   227                                               Parameters
   228                                               ----------
   229                                               labels : ndarray
   230                                                   The array of labels.
   231                                               rag : RAG
   232                                                   The region adjacency graph.
   233                                               thresh : float
   234                                                   The threshold. A subgraph won't be further subdivided if the
   235                                                   value of the N-cut exceeds `thresh`.
   236                                               num_cuts : int
   237                                                   The number or N-cuts to perform before determining the optimal one.
   238                                               map_array : array
   239                                                   The array which maps old labels to new ones. This is modified inside
   240                                                   the function.
   241                                               """
   242        59       218937   3710.8      3.2      d, w = _ncut.DW_matrices(rag)
   243        59          151      2.6      0.0      m = w.shape[0]
   244                                           
   245        59           61      1.0      0.0      if m > 2:
   246        44         3905     88.8      0.1          d2 = d.copy()
   247                                                   # Since d is diagonal, we can directly operate on its data
   248                                                   # the inverse of the square root
   249        44          471     10.7      0.0          d2.data = np.reciprocal(np.sqrt(d2.data, out=d2.data), out=d2.data)
   250                                           
   251                                                   # Refer Shi & Malik 2001, Equation 7, Page 891
   252        44        26997    613.6      0.4          vals, vectors = linalg.eigsh(d2 * (d - w) * d2, which='SM',
   253        44      6577542 149489.6     94.9                                       k=min(100, m - 2))
   254                                           
   255                                                   # Pick second smallest eigenvector.
   256                                                   # Refer Shi & Malik 2001, Section 3.2.3, Page 893
   257        44          618     14.0      0.0          vals, vectors = np.real(vals), np.real(vectors)
   258        44          833     18.9      0.0          index2 = _ncut_cy.argmin2(vals)
   259        44         2408     54.7      0.0          ev = _ncut.normalize(vectors[:, index2])
   260                                           
   261        44        22737    516.8      0.3          cut_mask, mcut = get_min_ncut(ev, d, w, num_cuts)
   262        44           78      1.8      0.0          if (mcut < thresh):
   263                                                       # Sub divide and perform N-cut again
   264                                                       # Refer Shi & Malik 2001, Section 3.2.5, Page 893
   265        29        78228   2697.5      1.1              sub1, sub2 = partition_by_cut(cut_mask, rag)
   266                                           
   267        29          175      6.0      0.0              _ncut_relabel(sub1, thresh, num_cuts)
   268        29           92      3.2      0.0              _ncut_relabel(sub2, thresh, num_cuts)
   269        29           32      1.1      0.0              return
   270                                           
   271                                               # The N-cut wasn't small enough, or could not be computed.
   272                                               # The remaining graph is a region.
   273                                               # Assign `ncut label` by picking any label from the existing nodes, since
   274                                               # `labels` are unique, `new_label` is also unique.
   275        30          685     22.8      0.0      _label_all(rag, 'ncut label')

As you can see above 95% of the time is taken by the call to eigsh.

To take a closer look at it, I plotted time while ensuring only one iteration. This commit here takes care of it. Also, I changed the eigsh call to look for the largest eigenvectors instead of the smallest ones, with this commit here. Here are the results.

one_it

A single eignenvalue computation is bounded by O(n^1.5) as mentioned in the original paper. The recursive NCuts are pushing the time required towards more than O(n^2).eigsh solves the eigenvalue problem for a symmetric hermitian matrix. It in turn relies on a library called ARPack. As documented here ARPack isn’t very good at finding the smallest eigenvectors. If the value supplied as the argument k is too small, we get the ArpackNoConvergence Exception. As seen from the above plot, finding the largest eigenvectors is much more efficient using the eigsh function.

Since the problem is specific to ARPack, using other libraries might lead to faster computation. slepc4py is one such BSD licensed library. The possibility of optionally importing slec4py should be certainly explored in the near future.

Also, we can optionally ask the user for a function to solve the eigenvalue problem, so that he can use a matrix library of his choice if he/she so desires.

Final Thoughts

Although the current Normalized Cut implementation takes more than quadratic time, the preceding over segmentation method does most of the heavy lifting. With something like SLIC, we can be sure of the number of nodes irrespective of the input image size. Although, a better eigenvalue finding technique for smallest eigenvectors would immensely improve its performance.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s