This repository contains a PyTorch implementation of the Graph Attention Networks (GAT) based on the paper "Graph Attention Network" by Velickovic et al.
The Graph Attention Network is a powerful graph neural network model for learning represtations on graph-structured data, which has shown excellent performance in various tasks such as node classification, link prediction, and graph classification.
This is a modular version of the example I made for Pytorch Examples for GAT.
The Graph Attention Network (GAT) is a graph neural network architecture designed specifically for handling graph-structured data. It leverages multi-head attention mechanism to capture the information of neighboring nodes in an attentive manner to learn representations for each node. This attention mechanism allows the model to focus on relevant nodes and adaptively weight their contributions during message passing.
Check out the following resources for more ino on GATs:
This repository provides a clean and short implementation of the official GAT model using PyTorch. The code is well-documented and easy to understand, making it a valuable resource for researchers and practitioners interested in graph deep learning.
- GAT Model: Implementation of the Graph Attention Network model with multi-head attention based on on the paper "Graph Attention Network" by Velickovic et al.
- Graph Attention Layers: Implementation of graph convolutional layers that aggregate information from neighboring nodes using a self-attention mechanisms to learn node importance weights.
- Training and Evaluation: Code for training GAT models on graph-structured data and evaluating their performance on node classification tasks on the Cora benchmark dataset.
- Python 3.7 or higher
- PyTorch 2.0 or higher
- Requests 2.31 or higher
- NumPy 1.24 or higher
The implementation includes support for the Cora dataset, a standard benchmark dataset for graph-based machine learning tasks. The Cora dataset consists of scientific publications, where nodes represent papers and edges represent citation relationships. Each paper is associated with a binary label indicating one of seven classes. The dataset is downloaded, preprocessed and ready to use.
The official architecture (used in this project) proposed in the paper "Graph Attention Network" by Velickovic et al. consists of two graph attention layers which incorporates the multi-head attention mechanisms during its message trasformation and aggregation. Each graph attention layer applies a shared self-attention mechanism to every node in the graph, allowing them to learn different representations based on the importance of their neighbors.
In terms of activation functions, the GAT model employs both the Exponential Linear Unit (ELU) and the Leaky Rectified Linear Unit (LeakyReLU) activations, which introduce non-linearity to the model. ELU is used as the activation function for the hidden layers, while LeakyReLU is applied to the attention coefficients to ensure non-zero gradients for negative values.
Following the official implementation, the first GAT layer consists of K = 8 attention heads computing F' = 8 features each (for a total of 64 features) followed by an exponential linear unit (ELU) activation on the layer outputs. The second GAT layer is used for classification: a single attention head that computes C features (where C is the number of classes), followed by a softmax activation for probablisitic outputs. (we use log-softmax instead for computational convenience with using NLLLoss)
Note that due to being an educational example, this implementation uses the full dense form of the adjacency matrix of the graph, and not the sparse form of the matrix. Thus all the operations in the model implemeation is done in a non-sparse from. This will not affect the model's performance accuracy-wise. However an sparse-friendly implementation will help with the efficiency in the use of resources, storage, and speed.
Training and evaluating the GAT model on the Cora dataset can be done through running the the main.py
script as follows:
- Clone the repository:
git clone https://github.com/ebrahimpichka/GAT-pt.git
cd GAT-pt/
- Install the required dependencies:
pip install -r requirements.txt
- Train the GAT model by running the the
train.py
script as follows:: (Example using the default parameters)
python train.py --epochs 300 --lr 0.005 --l2 5e-4 --dropout-p 0.6 --num-heads 8 --hidden-dim 64 --val-every 20
In more detail, the main.py
script recieves following arguments:
usage: train.py [-h] [--epochs EPOCHS] [--lr LR] [--l2 L2] [--dropout-p DROPOUT_P] [--hidden-dim HIDDEN_DIM] [--num-heads NUM_HEADS] [--concat-heads] [--val-every VAL_EVERY]
[--no-cuda] [--no-mps] [--dry-run] [--seed S]
PyTorch Graph Attention Network
options:
-h, --help show this help message and exit
--epochs EPOCHS number of epochs to train (default: 300)
--lr LR learning rate (default: 0.005)
--l2 L2 weight decay (default: 6e-4)
--dropout-p DROPOUT_P
dropout probability (default: 0.6)
--hidden-dim HIDDEN_DIM
dimension of the hidden representation (default: 64)
--num-heads NUM_HEADS
number of the attention heads (default: 4)
--concat-heads wether to concatinate attention heads, or average over them (default: False)
--val-every VAL_EVERY
epochs to wait for print training and validation evaluation (default: 20)
--no-cuda disables CUDA training
--no-mps disables macOS GPU training
--dry-run quickly check a single pass
--seed S random seed (default: 13)
After training for 300 epochs with default hyperparameters on random train/val/test data splits, the GAT model achieves around %81.25 classification accuracy on the test split. This result is comparable to the performance reported in the original paper. However, the results can vary due to the randomness of the train/val/test split.
@article{
velickovic2018graph,
title="{Graph Attention Networks}",
author={Veli{\v{c}}kovi{\'{c}}, Petar and Cucurull, Guillem and Casanova, Arantxa and Romero, Adriana and Li{\`{o}}, Pietro and Bengio, Yoshua},
journal={International Conference on Learning Representations},
year={2018},
url={https://openreview.net/forum?id=rJXMpikCZ},
}
- Paper on arxiv: arXiv:1710.10903v3
- Original paper repository: https://github.com/PetarV-/GAT