Skip to content

Graphene

William Silversmith edited this page Jul 23, 2020 · 46 revisions

Graphene is a format (graphene://) used by CloudVolume and our custom version of neuroglancer that takes a highly over-segmented volume and enables proofreading by linking segments together into larger objects in a graph. https://flywire.ai is a notable use of Graphene.

The Graphene format is backed by a service called the PyChunkGraph, an octree that tracks the spatial extent and time evolution of proofread neurons. The graph state at a given timepoint can be retrieved with a UNIX time stamp. The leaves of the graph (Layer 1) are the atomic units in space that track the initial state of the segmentation contained therein. The graph within each chunk consists of labels and edges between labels indicating that they are connected. Layer 2 of the graph tracks the proofread state of the atomic chunk. Proofreading operations consist of adding and subtracting edges from the graph, which translates in concrete terms to joining and splitting off pieces of neuron respectively. The higher levels of the graph represent edges between these atomic chunks and then edges between successively higher levels of the octree. A particular neuron label in space is particular to the level of the tree it comes from. The highest level of the octree represents fully agglomerated objects.

Graphene Specific CloudVolume Methods

cv.download(..., agglomerate=True, stop_layer=None, timestamp=unix timestamp) 
cv.agglomerate = True # change default download agglomerate mode to True
cv[:] # controlled by cv.agglomerate

# skips using the graph server and tries to find 
# the right meshes by guessing
cv.mesh.get(..., bypass=True/False) 

64-bit Label Structure

Image Credit: Sven Dorkenwald

The 8 bits in layer is mostly wasted unless we get to human brain scale. A label with the segid section zeroed is called a "chunk_id". X,Y, & Z together are called the "chunk_position". A uint64 containing only X,Y,Z shifted all the way to the right is called a "chunk_position_number". The segid field may contain hidden counters inside it.

Levels

Each object label contains within it the layer it comes from. This is indicated, as above, by the first eight bits of the segment ID. The graph is assembled into a hierarchy of levels to make representing multiple resolutions of edits possible. Level numbering starts at 1 instead of 0 due to the PyChunkGraph's genesis in Julia.

Our largest two datasets have 10 and 12 levels respectively.

Level Representation
0 Not Used.
1 Watershed
2 Intra-chunk agglomeration
3 First inter-chunk linkages w/ fan-out from each chunk
4 Fan-out from level 3 graph.
...
N "Root IDs" aka complete aggregated objects

Meshing

In order to control meshing costs, we are adopting a variation of the sharded format for initial meshes. Meshes must be created for each level of the chunk graph in order to provide a granular substrate for creating dynamic meshes. The initial meshes are based on a single timepoint in the chunk graph server, the dynamic meshes that come after are generated based on each proofreading action and are uploaded as individual mesh files (i.e. not sharded) as rewriting shards one mesh at a time would incur the same file creation cost as a simpler individual format. (It might be possible to aggregate multiple stitching levels into a single shard for something like an 8x savings on dynamic file creation costs, but these costs will probably be low as it is proportional to the number of proofreading edits on the dataset.)

The meshes will be initially created using zmesh, compressed using draco, and stored as initial meshes. All dynamic meshes can be generated by stitching initial or previous dynamic meshes. In order to properly fuse meshes across chunk boundaries, it's necessary to extend one pixel into a neighboring chunk. This requires a complex remapping operation to ensure the extension is mapped to the same segment ids as the originating chunk.

Dynamic meshes will be created on-demand as proofreading edits generate meshing requests. These requests will be fulfilled by a Real Time Stitcher (RTS) that is always running and checking for new requests. The results of this operation will be written into the dynamic meshing section of the mesh directory structure.

Directory Structure

meshes
   |- info # contains sharding information
   |\
   | initial
   |   2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12   # levels of the chunk graph
   |    \
   |    `121951224-0ea.shard`, ... (made up example, each shard contains many meshes) 
   |
   |\
   | dynamic
   |   \
   |   2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12   # levels of the chunk graph
   |    \
   |    `737912791273`, ... (made up example, each file contains one mesh)

The Sharded Format

With the exception of the lowest level of the graph server (L2) which can mesh and generate a shard file directly, each additional level of meshing requires some level of aggregation into multiple shards. For this reason we adapt the sharded file name to be $GRAPHENE_CHUNK_ID-$SHARD_NUMBER.shard where the $GRAPHENE_CHUNK_ID is the x,y,z bits of the label extracted and represented in base 10 and the $SHARD_NUMBER is the standard shard number assigned by the Precomputed sharding scheme.

As accessing sharded files is slower due to the need for consulting indices, we would like to balance the number of labels in a shard file with the total number of shard files for a given chunk, level pair. A larger number of shards increases the number of cache misses in CloudVolume and Neuroglancer's LRU cache for the constant size index. Too few shards means that reading the minishard index will become taxing.

As a rule of thumb, we should aim to make it so that reading a single uncached mesh is as close to perceptually instantaneous as possible on a 10 Mbps home connection (1 Gbps is the most common connection speed in the Princeton Neuroscience Institute). This means that all three requests + decompression should add up to less than 200 msec, a standard yardstick in HCI (humans can perceive smaller increments but this is the approximate threshold of annoyance). The reason this is a valid design goal is we wish to make proofreading a seamless experience, and loading meshes is core to the proofreading workflow. Batch analysis may require even higher throughput, but is less sensitive to individual mesh latency.

  • At 10 Mbps (1,250 kiB/sec), it takes 200 msec to download 250 kB.
  • At 100 Mbps (12,500 kiB/sec), it takes 200 msec to download 2.5 MB.
  • At 1 Gbps (125,000 kiB/sec) it takes 200 msec to download 25 MB.

If ping is 12 msec, then 36 msec of RTT is built into the request. This leaves 164 msec (150 kiB). To download a small 100 kB mesh, this leaves 50 kiB for fetching indices. gzip decodes at around 18 MB/sec, so it should take about 5 msec to decode. It is difficult to find reliable data on draco, this blog post is the best I've found.

In order to keep the size of minishard indicies to a reasonable size, we will need to progressively increase the number of shards and the number of minishards as the levels of the chunk graph are ascended. Since it makes sense to treat each L2 chunk as a single shard, we start at 0 shard bits. Given an L2 test region for fly, it seems that a 256x256x512 area contained ~6k labels. With a fan out of 2^3 per a chunk level, and Z getting truncated at 7062 slices, I attempted to balance the caching chance with download size in the below chart.

FLY

Based on the number of labels in a 256x256x512 L2 chunk, Fan Out = 2, and truncation in Z at 7062 slices.

Level	    Labels	S. Bits	MS.Bits	MS. kiB	Idx kiB	%Cached Bits Left (64 allotted)	
2	      6048      0	4	8.86	0.03125	100.0%	60	
3	     48384      0	6	17.72	0.125	100.0%	58	
4	    387072      0	9	17.72	1	100.0%	55	
5	   3096576      1	11	17.72	4	50.0%	52	
6	  24772608	2	12	35.44	8	25.0%	50	
7	  99090432	4	12	35.44	8	6.3%	48	Z starts being truncated here
8	 396361728	8	11	17.72	4	0.4%	45	
9	1585446912	10	10	35.44	2	0.1%	44	
10	6341787648	10	12	35.44	8	0.1%	42	

MINNIE

Based on the number of labels in a 256x256x512 L2 chunk, Fan Out = 2, and truncation in Z at ~16k slices.

Level	Labels		S. Bits	MS.Bits	MS kiB	Idx kiB	Shard MiB	%Cached	Bits Left (64 allotted)	
2	793		0	1	9.29	0.004	0.43		100.0%	63	
3	6344		0	3	18.59	0.016	3.45		100.0%	61	
4	50752		0	6	18.59	0.125	27.58		100.0%	58	
5	406016		0	8	37.17	0.5	220.63		100.0%	56	
6	1624064		0	10	37.17	2	882.52		100.0%	54	Z starts being truncated here
7	6496256		2	10	37.17	2	882.52		25.0%	52	
8	25985024	3	11	37.17	4	1765.04		12.5%	50	
9	103940096	5	11	37.17	4	1765.04		3.1%	48	
10	415760384	7	11	37.17	4	1765.04		0.8%	46	
11	1663041536	9	11	37.17	4	1765.04		0.2%	44	
12	6652166144	11	11	37.17	4	1765.04		0.0%	42	

Key:
Labels: The number of individual labels in a given chunk.
S. Bits: 2^sbits = the number of shards per a chunk.
MS. Bits: 2^msbits = the number of minishards in each shard
MS kib: The estimated size of a minishard in kibibytes 
Idx. Kib: The size of the fixed index that must be read first (and is cachable).
%Cached: The chance that the second request of a random label in the chunk would encounter a cached fixed index, avoiding a request.
Shard MiB: Estimated size of the resulting shard file in Mebibytes.  
Bits Left: Number of bits remaining in a 64-bit integer for allocating to preshift bits, shard bits, and minishard bits.

In an interesting twist, it seems that a faster connection would result in even more efficient downloads than expected. The maximum size of the indices rises with connection speed, which enables the use of fewer shards, hence making more effective use of the cache.

Furthermore, the conflicting optimization goals in the above chart (increasing cache hits, reducing the size of the fixed and minishard indicies) and their dependence on image statistics and dataset boundaries makes it somewhat difficult to formulate a mechanical formula that can be applied to all datasets at this time. Instead, each dataset will require a hand tuned sharding strategy for now. We'll need to provide the per level sharding parameters in the graphene mesh info file.

"sharding": {
   2: { ... sharding params ... },
   3: { ... },
   ...
   N: { ... },
}