# GSoC – Graph Data Structures Comparison

This post discusses the various graph data structures I have tested. A few days ago I briefly discussed the same in this post. I plan to elaborate more here and also discuss the CSR representation. The tests have been done from the point of view of feasibility to use for Region Adjacency Graphs ( RAGs ) . These images I am testing with are from a (5µm)^3 volume of D. melanogaster larval neuropil imaged at 10x10x10nm resolution using focused ion beam scanning electron microscopy (FIBSEM) ( courtesy Juan ) . The image is 500 x 500 x 500 pixels ( ~500 MB ) . I am also using two images from SNEMI 3D Segmentation challenge   of 1024 x 1024 x 100 pixels.

## Design Requirements

Although I am yet to handle them, Juan has pointed out that typical images he has worked on have had 8000 x 8000 x 8000 pixel resolution with close to 100,000 nodes. If we use a typical adjacency matrix with 8 bit unsigned entries, that would consume 9.5 GB. However the edges in a RAG is comparatively low, as compared to the maximum number of possible edges. Thus, we need a data structure whose storage requirements are a linear function of the number of edges. My initial intuition was to design a class by wrapping a thin layer of code around one of scipy’s sparse matrix classes. But collapsing edges ( joining two nodes ) is something not easily possible in any of these formats. Hence we decided on implementing our own data structure from scratch.

To test the 4 possible that we came up with, I am constructing the RAG for the image, and randomly merging nodes till there are only 10 left. The source code and the benchmarks can be found here.

## LIL Graph Class

See graph_lil.py and rag_lil.pyx. This is the typical adjacency list representation, except the lists aren’t lists, they are numpy arrays. The list of neighbors is also sorted in each case to optimize the lookup ( see add_edge function ). This also makes the code a little more complex than the other approaches. Out of all the approaches, this is the most memory efficient, but also the slowest, since adding an edge and merging nodes requires a lot of memory movement.

### Memory Profile

Apart from the image, the only thing consuming significant memory is the construction of the RAG.

```
Line #    Mem usage    Increment   Line Contents
================================================
12   18.223 MiB    0.000 MiB   @profile
13                             def test():
14  496.133 MiB  477.910 MiB       arr = np.load("../data/watershed.npy")
15  496.133 MiB    0.000 MiB       t = time.time()
16  504.375 MiB    8.242 MiB       g = graph.construct_rag(arr)
17
18
19  504.383 MiB    0.008 MiB       print "RAG construction took %f secs " % (time.time() - t)
20
21  504.383 MiB    0.000 MiB       t = time.time()
22  504.910 MiB    0.527 MiB       g.random_merge(10)
23  504.965 MiB    0.055 MiB       g.display()
24  504.965 MiB    0.000 MiB       print "Merging took %f secs " % (time.time() - t)

```

### Line Profile – Construction

Since line_profiler wouldn’t work with Cython, the output comes from Cython’s in build profiling.From the below output, it’s clear that insert and searchsorted functions consume the most time.

```   Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
72104155  275.539    0.000  640.501    0.000 rag_lil.pyx:18(add_edge)
144208310  274.654    0.000  274.654    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
1  164.755  164.755  999.104  999.104 {rag_lil.construct_rag_3d_lil}
144208312   90.060    0.000   90.060    0.000 stringsource:317(__cinit__)
144208310   84.936    0.000  359.589    0.000 fromnumeric.py:952(searchsorted)
144208311   66.758    0.000  156.817    0.000 stringsource:613(memoryview_cwrapper)
144208312   21.694    0.000   21.694    0.000 stringsource:339(__dealloc__)
144208311   15.142    0.000   15.142    0.000 stringsource:619(memoryview_check)
158956    3.893    0.000    5.373    0.000 function_base.py:3305(insert)
476868    0.726    0.000    0.726    0.000 {numpy.core.multiarray.array}
199734    0.272    0.000    0.272    0.000 {numpy.core.multiarray.empty}
158956    0.179    0.000    0.424    0.000 numeric.py:392(asarray)
158956    0.163    0.000    0.163    0.000 numeric.py:1299(rollaxis)
1    0.137    0.137    0.137    0.137 {method 'reduce' of 'numpy.ufunc' objects}
158956    0.127    0.000    0.127    0.000 {isinstance}
158956    0.051    0.000    0.051    0.000 {method 'item' of 'numpy.ndarray' objects}
1    0.019    0.019    0.057    0.057 graph_lil.py:8(__init__)
1    0.000    0.000    0.000    0.000 {range}
1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
1    0.000    0.000    0.137    0.137 fromnumeric.py:2048(amax)
1    0.000    0.000    0.137    0.137 _methods.py:15(_amax)
1    0.000    0.000  999.104  999.104 graph_lil.py:59(construct_rag)
1    0.000    0.000    0.000    0.000 stringsource:957(memoryview_fromslice)
1    0.000    0.000  999.104  999.104 <string>:1(<module>)
1    0.000    0.000    0.000    0.000 stringsource:933(__dealloc__)
1    0.000    0.000    0.000    0.000 stringsource:508(__get__)
1    0.000    0.000    0.000    0.000 stringsource:468(__getbuffer__)
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
```

### Line Profile – Merging

search, insert, and delete are consuming the most time in this case.

```   Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
20378  117.102    0.006  252.734    0.012 rag_lil.pyx:47(merge_node)
56129898   98.659    0.000   98.659    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
56129898   32.130    0.000  130.789    0.000 fromnumeric.py:952(searchsorted)
107196    1.997    0.000    2.734    0.000 function_base.py:3112(delete)
68494    1.339    0.000    1.837    0.000 function_base.py:3305(insert)
1    0.605    0.605  254.349  254.349 graph_lil.py:48(random_merge)
419874    0.573    0.000    0.573    0.000 {numpy.core.multiarray.array}
161011    0.427    0.000    0.453    0.000 random.py:173(randrange)
161011    0.263    0.000    0.716    0.000 random.py:236(randint)
282886    0.214    0.000    0.600    0.000 numeric.py:392(asarray)
175690    0.205    0.000    0.205    0.000 {numpy.core.multiarray.empty}
163024    0.162    0.000    0.162    0.000 stringsource:317(__cinit__)
282886    0.134    0.000    0.134    0.000 {isinstance}
20378    0.101    0.000  252.942    0.012 {rag_lil.merge_node_py}
40756    0.085    0.000    0.108    0.000 stringsource:957(memoryview_fromslice)
122268    0.068    0.000    0.213    0.000 stringsource:613(memoryview_cwrapper)
68494    0.065    0.000    0.065    0.000 numeric.py:1299(rollaxis)
20378    0.045    0.000    0.052    0.000 random.py:271(choice)
175690    0.044    0.000    0.044    0.000 {method 'item' of 'numpy.ndarray' objects}
20378    0.033    0.000  252.975    0.012 graph_lil.py:31(merge)
163024    0.033    0.000    0.033    0.000 stringsource:339(__dealloc__)
181389    0.029    0.000    0.029    0.000 {method 'random' of '_random.Random' objects}
1    0.023    0.023  254.371  254.371 <string>:1(<module>)
122268    0.015    0.000    0.015    0.000 stringsource:619(memoryview_check)
40756    0.006    0.000    0.006    0.000 stringsource:508(__get__)
40756    0.006    0.000    0.006    0.000 stringsource:468(__getbuffer__)
40756    0.005    0.000    0.005    0.000 stringsource:933(__dealloc__)
20378    0.005    0.000    0.005    0.000 {len}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects
```

## Netwrokx Graph Class

See graph_nx.py and rag_nx.pyx. This is a subclass of networkx.Graph. The only additional feature I added was the merge function. NetworX mainains a dictionary per node of its  adjacent nodes. Each edge has it’s own dictionary for maintaining it’s properties. This leads to increased memory usage. Because O(1) lookup in dictionaries, edge addition and contraction is very fast.As Juan pointed out to me because of the small load factor of Python’s dictionaries, most buckets remain empty. Dictionaries also have to store the key, as well as the value. Therefore, this required the 2nd highest memory amongst the tested approaches. Juan used a similar approach for his project here and has reported ~100 GB of RAM usage.

### Memory Profile

Graph construction consumes memory, and merging frees up space because we delete one of the merged nodes.

```Line #    Mem usage    Increment   Line Contents
================================================
12   23.023 MiB    0.000 MiB   @profile
13                             def test():
14  500.918 MiB  477.895 MiB       arr = np.load("../data/watershed.npy")
15  500.918 MiB    0.000 MiB       t = time.time()
16  530.703 MiB   29.785 MiB       g = graph.construct_rag(arr)
17
18
19  530.719 MiB    0.016 MiB       print "RAG construction took %f secs " % (time.time() - t)
20
21  530.719 MiB    0.000 MiB       t = time.time()
22  517.906 MiB  -12.812 MiB       g.random_merge(10)
23                                 #g.display()
24  517.906 MiB    0.000 MiB       print "Merging took %f secs " % (time.time() - t)
```

### Line Profile – Construction

Since the function to add edges comes directly from the netwrox class, I haven’t profiled it.

### Line Profile – Merging

Here we have a line by line time profiling bu using the line_profiler module. The most significant contribution is by looping through the neighbors to get edge weights.

```Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
26                                               @profile
27                                               def merge(self, i, j):
28
29     20378        45880      2.3      4.2          if not self.has_edge(i, j):
30                                                       raise ValueError('Cant merge non adjacent nodes')
31
32                                                   # print "before ",self.order()
33     94147       115543      1.2     10.5          for x in self.neighbors(i):
34     73769        58640      0.8      5.3              if x == j:
35     20378        14190      0.7      1.3                  continue
36     53391       139198      2.6     12.6              w1 = self.get_edge_data(x, i)['weight']
37     53391        39027      0.7      3.5              w2 = -1
38     53391       103894      1.9      9.4              if self.has_edge(x, j):
39     19352        42703      2.2      3.9                  w2 = self.get_edge_data(x, j)['weight']
40
41     53391        57866      1.1      5.2              w = max(w1, w2)
42
43     53391       313574      5.9     28.4              self.add_edge(x, j, weight=w)
44
45     20378       172383      8.5     15.6          self.remove_node(i)
```

## Custom Graph Class

See graph_custom.py and rag_custom.pyx. This is something me and Stefan came up with during a chat. This is similar to NetworkX’s approach, but the nodes are stored in a list instead of a dictionary. Also, instead of maintaining a dictionary for a property ( like weight ) per edge, we maintain one dictionary per node, where the keys are the adjacent nodes and the values are the weights. Both the Custom as well as the NetworkX representation have comparatively more memory requirements than LIL because dictionaries have to store the key and value, and also due to the small loading factor of Python’s dictionaries. Although not the most efficient in terms of memory, but it’s the fasted for construction of graph and merging of nodes.

### Memory Profile

Again, we see that the RAG itself adds to the memory other than the image. Merging nodes does not result in a significant decrease in memory because, one of the merged nodes still holds an empty dictionary.

```Line #    Mem usage    Increment   Line Contents
================================================
13   18.184 MiB    0.000 MiB   @profile
14                             def test():
15  496.137 MiB  477.953 MiB       arr = np.load("../data/watershed.npy")
16  496.137 MiB    0.000 MiB       t = time.time()
17  507.371 MiB   11.234 MiB       g = graph.construct_rag(arr)
18
19  507.383 MiB    0.012 MiB       print "RAG construction took %f secs " % (time.time() - t)
20
21                                 # print g.max_size
22  507.383 MiB    0.000 MiB       t = time.time()
23  506.543 MiB   -0.840 MiB       g.random_merge(10)
24  506.543 MiB    0.000 MiB       g.display()
25                                 # print g.max_size
26  506.543 MiB    0.000 MiB       print "Merging took %f secs " % (time.time() - t)
```

### Line Profile – Construction

The dictionary update and the look up, take the most time. Maintaining edge count was done only for test purposes.

```Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
24                                               @profile
25                                               def make_edge(self, i, j, wt):
26  72157546     40339563      0.6     19.9          try:
27  72157546     56802957      0.8     28.0              self.rows[i][j]
28     73778        92866      1.3      0.0          except KeyError:
29     73778        72716      1.0      0.0              self.edge_count += 1
30
31  72157546     52556299      0.7     25.9          self.rows[i][j] = wt
32  72157546     53126454      0.7     26.2          self.rows[j][i] = wt
```

### Line Profile – Merging

Making new edges takes the most time.

```Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
41                                               @profile
42                                               def merge(self, i, j):
43
44     20378        48653      2.4      3.5          if not self.has_edge(i, j):
45                                                       raise ValueError('Cant merge non adjacent nodes')
46
47
48                                                   # print "before ",self.order()
49     94147       100716      1.1      7.3          for x in self.neighbors(i):
50     73769        56567      0.8      4.1              if x == j:
51     20378        14158      0.7      1.0                  continue
52     53391        94329      1.8      6.8              w1 = self.get_weight(x, i)
53     53391        38553      0.7      2.8              w2 = -1
54     53391       181510      3.4     13.1              if self.has_edge(x, j):
55     19352        29334      1.5      2.1                  w2 = self.get_weight(x,j)
56
57     53391        59662      1.1      4.3              w = max(w1, w2)
58
59     53391       648168     12.1     46.8              self.make_edge(x, j, w)
60
61     20378       113651      5.6      8.2          self.remove_node(i)

```

## CSR Graph Class

See graph_csr.py and rag_csr.pyx.  This is inspired for the scipy’s csr_matrix. To account for merging of nodes, we append the data of the merged node towards the end of indptr, indices and data array ( merge function ).  We invalidate the merged nodes by setting a valid flag to False. These arrays keep doubling in size whenever  required ( double function), which leads to O(1) amortized memory movement cost. The merging code is also almost pure cython with very less Python calls, which makes it faster than the LIL approach. As we later discovered, the doubling happens 7 times for our test case, which leads to a 256 times increase in the array size. This results in the highest memory usage among all approaches.

### Memory Profile

Apart from the graph, merging increases memory usage. This is because the arrays double their size, which can happen any number of times. For this particular case, it occurred 9 times.

```Line #    Mem usage    Increment   Line Contents
================================================
13   20.031 MiB    0.000 MiB   @profile
14                             def test():
15  498.074 MiB  478.043 MiB       arr = np.load("../data/watershed.npy")
16  498.074 MiB    0.000 MiB       t = time.time()
17  500.281 MiB    2.207 MiB       g = graph.construct_rag(arr)
18
19  500.293 MiB    0.012 MiB       print "RAG construction took %f secs " % (time.time() - t)
20
21                                 #print g.max_size
22  500.293 MiB    0.000 MiB       t = time.time()
23  825.211 MiB  324.918 MiB       g.random_merge(10)
24                                 #print g.max_size
25  825.211 MiB    0.000 MiB       print "Merging took %f secs " % (time.time() - t)
```

### Line Profile – Construction

For test purposes , this was done using scipy’s dok_matrix.

### Line Profile – Merging

We have to profile Cython code again. As expected, doubling the array and copying elements, take the most time.

```   Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
20378    9.164    0.000   10.663    0.001 rag_csr.pyx:17(merge)
20378    0.515    0.000   11.658    0.001 graph_csr.py:69(merge)
8    0.451    0.056    0.452    0.057 graph_csr.py:51(double)
288383    0.387    0.000    0.415    0.000 random.py:173(randrange)
20378    0.353    0.000    0.353    0.000 {method 'sort' of 'numpy.ndarray' objects}
20378    0.319    0.000    0.823    0.000 arraysetops.py:93(unique)
1    0.318    0.318   12.573   12.573 graph_csr.py:79(random_merge)
40756    0.265    0.000    0.270    0.000 {numpy.core.multiarray.concatenate}
288383    0.141    0.000    0.556    0.000 random.py:236(randint)
142646    0.134    0.000    0.134    0.000 stringsource:317(__cinit__)
20378    0.069    0.000    0.069    0.000 {numpy.core.multiarray.copyto}
101890    0.067    0.000    0.193    0.000 stringsource:613(memoryview_cwrapper)
20378    0.055    0.000    0.055    0.000 {numpy.core.multiarray.empty_like}
20378    0.051    0.000    0.175    0.000 numeric.py:78(zeros_like)
20378    0.049    0.000    1.028    0.000 arraysetops.py:379(union1d)
40756    0.041    0.000    0.054    0.000 stringsource:957(memoryview_fromslice)
20378    0.037    0.000    0.037    0.000 {method 'flatten' of 'numpy.ndarray' objects}
20378    0.035    0.000    0.041    0.000 random.py:271(choice)
1    0.030    0.030   12.603   12.603 <string>:1(<module>)
308761    0.030    0.000    0.030    0.000 {method 'random' of '_random.Random' objects}
142646    0.029    0.000    0.029    0.000 stringsource:339(__dealloc__)
20378    0.028    0.000   10.691    0.001 {rag_csr.merge}
101890    0.014    0.000    0.014    0.000 stringsource:619(memoryview_check)
40756    0.005    0.000    0.005    0.000 stringsource:933(__dealloc__)
40756    0.005    0.000    0.005    0.000 stringsource:468(__getbuffer__)
40756    0.004    0.000    0.004    0.000 stringsource:508(__get__)
20378    0.004    0.000    0.004    0.000 {len}
16    0.001    0.000    0.001    0.000 {numpy.core.multiarray.zeros}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

```

## Time Comparison

The time taken for each approach for all 3 volumes used.

## Memory Usage

This plot shows the maximum memory usage for each approach. Notice That for the CSR approach, for case 1, the memory usage during merging is significantly higher. This is due to the fact that, doubling of the array, can happen any number of times.

Advertisements

## 5 thoughts on “GSoC – Graph Data Structures Comparison”

1. Thank you for taking such care in writing up this insightful blog post, Vighnesh.
Could you comment on why the LIL representation uses so much less memory?
Side note: Networx -> NetworkX

• Also, what is your feeling about a parallel implementation where we give the user the option to choose whether they want fast computation or reduced memory use? Would that be feasible?

• I have added a line about that. It is because Custom and NetworkX representations store keys as well as values, and also doue to the small load factor, many buckets in the ductionary are empty.

2. Pingback: GSoC – Mid Term Status Update | A Simple Progammer's Blog
3. Pingback: GSoC 2014 – Signing off | A Simple Progammer's Blog