Skip to content

Flow Rank Pattern

okram edited this page Sep 17, 2011 · 24 revisions

Many times its important to determine how many times a particular element is traversed over. That is, determine the flow through an element. Particular examples include:

  • “Rank my friends friends by how many friends we share in common.”
  • “Rank items by how many people, who like the same things I like, like them.”
gremlin> software = []
gremlin> g.idx(T.v)[[lang:'java']] >> software
==>true
gremlin> software                                  
==>v[3]
==>v[5]

Vertices v[3] and v[5] are software projects. Lets determine who is on the most software projects. In other words, lets traverse out of these vertices and see which developer vertices get the most flow.

gremlin> software._().in('created').name.groupCount.cap>>1  
==>marko=1
==>peter=1
==>josh=2

Josh received the most traversals through him. The groupCount step maintains an internal Map<Object,Number>. The cap step is used to “cap” groupCount and have it emit its internal map, not the elements that flow through it. The following example better explains “capping” by demonstrating what happens when its not used.

gremlin> m = [:]
gremlin> software._().in('created').name.groupCount(m)
==>marko
==>josh
==>peter
==>josh
gremlin> m
==>marko=1
==>josh=2
==>peter=1

Here is a more complicated example using the loop step and the Grateful Dead graph diagrammed in Defining a More Complex Property Graph.

g = new TinkerGraph()
g.loadGraphML('data/graph-example-2.xml')

The example below will continue to loop until a counter reaches 1000 (so the loop doesn’t continue indefinitely). The loop will walk the outgoing edges of a vertex and update the flow map m. What is returned is how many times each song is traversed when starting from vertex 12 (Me and My Uncle). Note that g.v(12) is not a pipe, so the looping only happens over outE.inV.name.groupCount(m).back(2). Finally filter{false} is appended so no results are outputted to the terminal.

gremlin> c = 0                                                                
==>0
gremlin> m = [:]                                                              
gremlin> g.v(12).out.groupCount(m){it.name}.loop(2){c++ < 1000}.filter{false}
gremlin> m
==>CHINA CAT SUNFLOWER=533
==>RAMBLE ON ROSE=521
==>WHARF RAT=207
==>HES GONE=447
==>HURTS ME TOO=107
==>DARK STAR=367
==>SING ME BACK HOME=114
==>LOOSE LUCY=293
==>BERTHA=460
==>BIG RIVER=357
==>BROKEDOWN PALACE=218
==>DONT EASE ME IN=475
==>BIRD SONG=403
...
gremlin> println g.v(12).out.groupCount(m){it.name}.loop(2){c++ < 1000}.filter{false}
[StartPipe, LoopPipe([OutPipe, GroupCountFunctionPipe]), FilterFunctionPipe]
==>null

Finally, you can sort your rankings and, for example, get the top 5 results.

gremlin> m.sort{a,b -> b.value <=> a.value}[0..4]
==>PLAYING IN THE BAND=586
==>ME AND MY UNCLE=557
==>JACK STRAW=534
==>CHINA CAT SUNFLOWER=533
==>EL PASO=524