Skip to content

The Economics: Skeletons for the People

William Silversmith edited this page Feb 24, 2021 · 18 revisions

Kimimaro is designed to mass produce skeletons at low compute cost. This is important because traditionally, skeleton generation has been regarded as an expensive operation. Kimimaro is fast on a per-label basis, but it also runs certain algorithms such as connected components and distance transform on all labels at once, resulting in multiple order-of-magnitude speedups.

For a Petascale image, we run two phases to produce complete objects. First, skeletonize all labels in a grid of cutouts that overlap by a single axial voxel. Second, take the resulting fragments and stitch them together. Since writing hundreds of millions of files incurs a substantial cost, we’ve also integrated into the second step a grouping procedure to write out batches of skeletons into a Neuroglancer “shard” file that retains the random access property.

Here's an example calculation for the cost of running Kimimaro on a petavoxel of connectomics data at mip 3 on Google Compute Engine + Google Cloud Storage. Prices for AWS would be similar but are less predictable because of spot pricing. The 30 min. estimate for primary skeletonization below is based on a Jan. 2020 large-scale run that occurred prior to several speedups. We have some evidence that those improvements are worth about 1.75x. Note that all kinds of things can happen that can invalidate these assumptions, such as misconfiguring the pipeline, which can result in large multiples incurred.

PRIMARY SKELETONIZATION

1 Petavoxel = 200,000 x 200,000 x 25,000 voxels at 4x4x40 nm resolution  
MIP 3 (typically 32x32x40 nm) =  25,000 x 25,000 x 25,000 voxels = 15.6 TVx  

Using 512x512x512 voxel tasks = 116,416 tasks  
  
Cloud computing using preemptible highmem instances ≈ $0.0135 per vCPU/hr  
Typical task time ≈ 30 min (based on large scale run in Jan. 2020, data dependent)  

Core Hours: (116,416 tasks) * (30 min / 60 min/hr)
	= 58,208 core-hours
Compute Time Cost ≈ (58,208 core-hours) * (0.0135 $/core-hr)   
        ≈ $786

For the merging step, we generated a sharded Neuroglancer volume and downloaded the data for processing onto a local cluster running a GPFS filesystem. From a cost minimization perspective, it is very important to opt for sharded processing as the large number of files generated (one per a label) incurs $5 per million files, which becomes substantial when hundreds of millions or billions of files are contemplated. It was important to run this step on a cluster with a POSIX filesystem because we condensed the JSON spatial index into a sqlite3 database for fast lookups and were able to randomly access the generated fragments from the first step using mmap. The sqlite3 speedup is worth an order of magnitude. The mmap speedup is between 2-3x.

It is possible to run sharded merging in the cloud, but it is currently much less efficient. With additional engineering, cloud instances could use a MySQL database and access the fragments using RANGE reads. However, this only becomes a major factor at petascale. For datasets about 10x smaller, the default processing mode is sufficient if somewhat inefficient. For datasets with 10 million labels or fewer, non-sharded processing is financially viable (~$50-$100 in file creation costs).

SHARDED MERGING STEP

Egress of 883 GB of skeletons + metadata to local cluster:

	883 GB * (0.08 to 0.12 $/GB) ≈ $71 to $105 egress charge

For the calculation below, we’ll assume the local cluster’s time is priced 
according to GCP pre-emptible instances even though it was institutional 
resources.

There were 524,288 shard generation tasks. This is more than the number of 
step one tasks because each skeleton is computationally intensive to merge 
and so splitting up the work better results in reasonable task completion times.

Aside: Our notes did not include the timings after we introduced mmap, 
only an inferred ratio based on profiler info, so the times might be 
closer to 10-15 sec/task.

Core Hours: 524,288 tasks * 30 sec/task ÷ 3600 sec/hr
	= 4,369 core-hrs 
Compute Cost: (4,369 core-hrs) * (0.0135 $/core-hr)
	= $59

Ingress back to GCP is essentially free.

The volume this processing was conducted on was about 65% of a petascale 
volume, so we adjust accordingly.

Merging cost: (1/.65) * ($59 + $105) = $252

TOTAL COST:

	Primary Skeletonization:    ~$786
	Merging + Shard Generation: ~$252
     —————————————————————————————————
	Total:                      ~$1,038
        w/ undocumented speedups:   ~$  640 (maybe)

As you can see, with a sophisticated approach, the skeletonization procedure can be completed for a very reasonable sum at scale that appears to be between 0.9-2 TVx/$. The cost is mainly influenced by the shape of the data and the number of labels however so maybe that isn’t the best measure. The computation can take several days and requires some human monitoring and intervention.

Of course your mileage may vary. Make sure you perform experiments on your own hardware and use prices applicable to them.

Lastly, these are example calculations and not any kind of guarantee. Process petascale datasets at your own risk!

The code used for running the procedure can be found in Igneous:

https://github.com/seung-lab/igneous

Clone this wiki locally