Shared-memory Parallel Clustering

Organization

This repository contains the implementations for our shared-memory parallel and sequential algorithms for correlation and modularity clustering. Note that the repository uses the Graph-Based Benchmark Suite (GBBS) for parallel primitives and benchmarks.

clustering contains the main executable, cluster-in-memory_main. It serves as a central runner for all of our implementations.

Installation

Compiler:

g++ >= 7.4.0 with support for Cilk Plus
g++ >= 7.4.0 with pthread support for the ParlayLib scheduler

Build system:

Bazel >= 3.5.0 and < 4.0.0

To build:

$ bazel build //clustering:cluster-in-memory_main

Most optionality from the Graph Based Benchmark Suite (GBBS) applies. In particular, to compile benchmarks for graphs with more than 2^32 edges, the LONG command-line parameter should be set.

Also, the parallel scheduler can be chosen using comand-line parameters. The default compilation is the ParlayLib scheduler (Homegrown), and the parameters HOMEGROWN, CILK, OPENMP, and SERIAL switch between the Homegrown scheduler, Cilk Plus, OpenMP, and serial respectively.

Graph Format

The applications take as input the adjacency graph format used by Graph Based Benchmark Suite (GBBS).

Configurations

All clusterers take configs given in a protobuf, as detailed in config.proto, which can be passed in by setting the input clusterer_config. Note that the proto should be passed in text format. For both correlation and modularity clustering, a CorrelationClustererConfig should be passed in.

The options available in CorrelationClustererConfig are:

resolution: double indicating resolution parameter
louvain_config: LouvainConfig object, with
- num_iterations: int specifying the number of iterations of best moves and compression steps
- num_inner_iterations: int specifying the number of iterations of local best move steps
refine: bool indicating whether to perform multi-level refinement
async: bool indicating whether to use the asynchronous setting
move_method: MoveMethod enum, that sets the subset of vertices to consider for best moves to be:
- NBHR_CLUSTER_MOVE: neighbors of clusters that have been modified
- ALL_MOVE: all vertices
- NBHR_MOVE: neighbors of vertices that have moved
all_iter: bool solely for the sequential implementations; indicates if the restriction on the number of iterations should be ignored and if the program should run to convergence
permute: bool solely for the parallel implementations; indicates whether to randomly permute vertices before performing best moves, which may improve performance when there is greater contention from the asynchronous setting

Example Usage

The main executable is cluster-in-memory_main in the clustering directory. It will run the clusterer given by the input clusterer_name: "ParallelCorrelationClusterer", "ParallelModularityClusterer", "CorrelationClusterer", or "ModularityClusterer".

A template command is:

bazel run -c opt  //clustering:cluster-in-memory_main -- --input_graph=</path/to/graph> --output_clustering=</path/to/output> --clusterer_name=<clusterer name> --clusterer_config='<clusterer proto>'

An example <clusterer proto> is correlation_clusterer_config {resolution: 0.01, refine: true, async: true, move_method: NBHR_MOVE}.

Paper

The full version of the paper can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
clustering		clustering
parallel		parallel
.bazelrc		.bazelrc
.gitignore		.gitignore
Parallel_Correlation_Clustering__Scalable_Data_Science.pdf		Parallel_Correlation_Clustering__Scalable_Data_Science.pdf
README.md		README.md
WORKSPACE		WORKSPACE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shared-memory Parallel Clustering

Organization

Installation

Graph Format

Configurations

Example Usage

Paper

About

Releases

Packages

Languages

jeshi96/parallel-correlation-clustering

Folders and files

Latest commit

History

Repository files navigation

Shared-memory Parallel Clustering

Organization

Installation

Graph Format

Configurations

Example Usage

Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages