GitHub - iDC-NEU/NeutronTP

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism.

NeutronTP is a load-balanced and efficient distributed full-graph GNN training system with GNN tensor parallelism:

NeutronTP utilizes tensor parallelism for distributed GNN training, eliminating cross-worker vertex dependencies by partitioning features instead of graph structures.
NeutronTP employs a generalized decoupling training method to separate NN operations from graph aggregation operations, significantly reducing communication volume and frequency in GNN tensor parallelism..
NeutronTP employs a memory-efficient subgraph scheduling strategy to support large-scale graph processing and overlap the communication and computation tasks.

Currently, NeutronTP is under refactoring. We will release all features of NeutronOrch soon.

Getting started

Setup a clean environment.

conda create --name NTP
conda activate NTP

Install pytorch (needed for training) and other libraries (needed for downloading datasets).

// Cuda 10:
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch-lts
conda install -c dglteam dgl-cuda10.2
conda install pyg -c pyg -c conda-forge
pip install ogb

// Cuda 11:
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia
conda install -c dglteam dgl-cuda11.1
conda install pyg -c pyg -c conda-forge
pip install ogb

Compile and install spmm. (Optional. CUDA dev environment needed.)

cd spmm_cpp
python setup.py install

Prepare datasets (edit the code according to your needs).

//This may take a while.
python prepare_data.py

Train.


python main.py --nprocs=1 --nodes=16 --nlayers=2 --hidden=256 --epoch=100 --backend=nccl --dataset=reddit --model=GCN
        --nprocs | the number of GPUs in the node
        --nodes | the number of nodes
        --nlayers | the number of model layers
        --hidden | the dimension of hidden layers
        --nodes | the number of nodes
        --epoch | the number of epoch
        --backend | communication backend
        --dataset | input graph
        --model | training model

Contact

For the technical questions, please contact: Xin Ai ([email protected]) and Hao Yuan ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
coo_graph		coo_graph
dist_utils		dist_utils
models		models
spmm_cpp		spmm_cpp
.gitignore		.gitignore
dist_train.py		dist_train.py
evaluate_dist_env.py		evaluate_dist_env.py
main.py		main.py
p548-ai.pdf		p548-ai.pdf
prepare_data.py		prepare_data.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism.

Getting started

Contact

About

Releases

Packages

Contributors 2

Languages

iDC-NEU/NeutronTP

Folders and files

Latest commit

History

Repository files navigation

NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism.

Getting started

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages