CS224W - Adds TransD KGE, and Bernoulli corruption strategy for all KGE #9864

mattjhayes3 · 2024-12-15T00:52:05Z

Implements TransD and Bernoulli corruption strategy (used in TransD and TransH papers).

Details

Did not see any PRs for these ones yet :)
Used the KGEModel base and tried to keep consistent with other KGE
Tried to implement everything as efficiently as possible while following the paper
Incorporated into examples/kge_fb15k_237.py, also adding duration calculation there
The Bernoulli parameters are currently computed for each batch rather than the whole training set
- This is a little slower but seemed to work much better in terms of metrics
- Also seems a little more intuitive for the user to simply flip a bool as opposed to requiring they pass their whole training set for initialization, or requiring they call loader().
- We might also try cumulating the statistics during training, but these initial experiments suggest this would be the worst of both worlds (slower as we won't know when first epoch has ended, ultimately similar to precomputing in terms of metrics)
Happy to split Bernoulli into a separate PR, or make any other edits you might prefer.

Benchmarks

I was only able to compare it to TransE on examples/kge_fb15k_237.py and do 3 runs each, but even without any hyperparameter tuning, findings seem fairly consistent with paper, i.e. significant improvements in all metrics [colab]:

Method/Evalset	Mean Rank	MRR	Hits@10
TransD Bernoulli Val	176.19 ± 0.74	0.223 ± 0.000	0.408 ± 0.001
TransD Bernoulli Test	180.65 ± 0.71	0.220 ± 0.001	0.401 ± 0.002
TransD Val	180.08 ± 0.37	0.211 ± 0.003	0.400 ± 0.002
TransD Test	184.06 ± 1.78	0.210 ± 0.003	0.396 ± 0.002
TransE Val	258.74 ± 5.76	0.221 ± 0.000	0.366 ± 0.004
TransE Test	268.30 ± 6.39	0.217 ± 0.001	0.362 ± 0.004

Fixes pyg-team#9744 --------- Co-authored-by: rusty1s <[email protected]>

Before the fix, tensors can only be concatenated over `dim=0`. The `dim` argument was not used by any operation in the function. This update allows the tensors to be concatenated over any given dimension. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…am#9743) Provide a clear error message for error in pyg-team#4554

Matplotlib arrows don't have a source and a destination. They have a text and a dot. As the [example](https://matplotlib.org/stable/gallery/text_labels_and_annotations/fancyarrow_demo.html) shows, a `<-` arrow points from the dot to the text and `->` points from the text to the dot. <img width="153" alt="Screenshot 2024-11-11 at 15 47 24" src="https://github.com/user-attachments/assets/7a8333d3-6c58-46b2-a47b-ae6c6afff87d"> `_visualize_graph_via_networkx()` sets `xy=pos[src]` and `xytext=pos[dst]`, so we want arrows from the dot (`xy`) to the text (`xytext`). That's the `<-` arrow. This can also be confirmed by comparing the visualization to the GraphViz version. The arrows go the other way. Or just looking at the data and the picture, haha! It took me a while to figure out that it's not my graph that's messed up! 😅

…ral mesh elements (pyg-team#9776) Now transforms 2D triangular elements/faces with shape [3,n] to as well as 3D tetrahedral elements with shape [4,n] to edges of shape [2,n]. Including pytest case for face input with shape [4,n] --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: rusty1s <[email protected]>

…pyg-team#9748) This patch adds support of torch_delaunay package, which works with Torch tensors. The new implementation uses `torch_delaunay` package if it is installed (by default), and then falls back to the `scipy` implementation otherwise. --------- Co-authored-by: rusty1s <[email protected]>

…yg-team#9756) Example use case: ```python from torch_geometric.utils import k_hop_subgraph import torch edge_index = torch.tensor([[1, 2, 3], [0, 1, 1]]) # get the 2-hop neighbors of node 0 in the directed graph. _, edge_index, _, edge_mask = k_hop_subgraph(0, 2, edge_index, relabel_nodes=False, directed=True) ``` This gives the following result: Expected Outcome: ```python >>> edge_index tensor([[1, 2, 3], [0, 1, 1]]) >>> edge_mask tensor([True, True, True]) ``` Actual Outcome: ```python >>> edge_index tensor([[2, 3], [1, 1]]) >>> edge_mask tensor([False, True, True]) ``` This stems from the fact that the line `torch.index_select(node_mask, 0, row, out=edge_mask)` overwrites `edge_mask`, effectively only marking the edges used in the final hop as `True`. To fix this, I have added `preserved_edge_mask ` that will mark the edges used in each hop as `True`. --------- Co-authored-by: rusty1s <[email protected]>

Removes `TensorAttr.fully_specify` which was originally added in pyg-team#4534. --------- Co-authored-by: rusty1s <[email protected]>

Fixed the typo in the description of NeighborLoader.

reopened pyg-team#9591 Feature summary: - Add GLEM as GNN & LLM Co-training model to PyG - adapt GLEM's LM to AutoModelForSequenceClassification from transformers - Lora support - LM/LLM support - ogbn-products/ogbn-arxiv testing finished - TAGDataset can be used as a wrapper class for any node classification dataset in PyG with LM tokenizer and associate raw text - external prediction as pseudo labels supported --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rishi Puri <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]>

### Issue - pyg-team#9694 - pyg-team#9698 ### Feature Summary - Add `MoleculeGPTDataset` - Add `MoleculeGPT` as GNN & LLM Co-training model to PyG - Add an example for training and testing - Split the PR into 3 sub-PRs (pyg-team#9723, pyg-team#9724, pyg-team#9725) - Limited hardware resources, can't load `lmsys/vicuna-7b-v1.5`, use `TinyLlama/TinyLlama-1.1B-Chat-v0.1` instead, and the full training pipeline was not tested --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Giovanni Gatti <[email protected]> Co-authored-by: Rishi Puri <[email protected]>

…ion demo (pyg-team#9797)

### Issue - pyg-team#9694 - pyg-team#9700 ### Feature Summary - Add `GitMolDataset` - Add `GITMol` as GNN & LLM Co-training model to PyG - Add an example for pre-training - Limited hardware resources, so the full training pipeline was not tested - Multi modal cross attention shares the same weight, not aligned with the original paper --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rishi Puri <[email protected]>

…xamples) (pyg-team#9666) Follow up to [PR 9597](pyg-team#9597). Includes multiple changes related to LLM+GNN experiments and scaling up to a remote backend. Including: - LargeGraphIndexer for building a large knowledge graph locally from multiple samples in an arbitrary dataset - Remote Backend Loader and examples for deploying a Retrieval algorithm to a third party backend FeatureStore or GraphStore - NVTX profiling tools for nsys users - Quality of Life improvements and benchmarking scripts for G-Retriever. Updates using these for WebQSP will be moved to a seperate PR UPDATE: PR is being broken up into smaller PRs. These can be previewed here: - zaristei#6 - zaristei#7 - zaristei#8 --------- Co-authored-by: Zack Aristei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zachary Aristei <[email protected]> Co-authored-by: Rishi Puri <[email protected]>

…ion (pyg-team#9807) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

baching -> batching

Updated to use new NGC CUDA DL base image. Some differences: 1. /workspace is the working directory 2. Python libs removed that were not included in NGC PyG image: `torch_scatter torch_sparse torch_cluster torch_spline_conv torchnet==0.0.4 h5py torchnet ` 3. Using latest stable versions for graphviz and torch --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rishi Puri <[email protected]>

This reverts commit eef03e5.

Fix some issues with the docstrings of LargeGraphIndexer. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zachary Aristei <[email protected]>

To avoid issues when node types contain the `EDGE_TYPE_STR_SPLIT` delimiter. --------- Co-authored-by: rusty1s <[email protected]>

This reverts commit e20f018.

mattjhayes3 · 2024-12-15T02:30:36Z

Sorry for all the extra auto-added reviewers! Was trying to fix the timeline with suggestion in https://stackoverflow.com/questions/16306012/github-pull-request-showing-commits-that-are-already-in-target-branch

Abandoning this in favor of #9866

mattjhayes3 and others added 30 commits November 6, 2024 21:24

Initial non-interactive implementation for GATConv

eef03e5

initial copy for diffing

4d1ac24

Check in the TransD and bern impls

c55de3e

full training set bernoulli

1082557

back to batchwise bern

c450f41

addl tests, add bern to other variants

6755ec8

fix last commit

2354e7d

Ensure *_idx are fully distributed across nodes/devices (pyg-team#9753)

9a3e3a8

Fixes pyg-team#9744 --------- Co-authored-by: rusty1s <[email protected]>

Add an error message for batching when num_nodes is unknown (pyg-te…

e62a075

…am#9743) Provide a clear error message for error in pyg-team#4554

Improve __inc__ error message and add tests (pyg-team#9778)

150ec97

Upgrade CI to PyTorch 2.5 (pyg-team#9779)

65d3e2c

Cancel intermediate CI builds (pyg-team#9781)

6b17856

Added PyTorch 2.5 support (pyg-team#9780)

7119785

Drop TensorAttr.fully_specify (pyg-team#9782)

197eac5

Removes `TensorAttr.fully_specify` which was originally added in pyg-team#4534. --------- Co-authored-by: rusty1s <[email protected]>

solve issue 9755 (fix typo) (pyg-team#9790)

23c4844

Fixed the typo in the description of NeighborLoader.

Add comment in g_retriever.py pointing to Neo4j Graph DB integrat…

aa61d21

…ion demo (pyg-team#9797)

Run GitMolDataset tests only in full test mode (pyg-team#9804)

073aa52

fix for cugraph (pyg-team#9803)

625561b

Check that custom edge types actually exist in NumNeighbors definit…

d163641

…ion (pyg-team#9807) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Fix typo in Dataset docstring (pyg-team#9813)

25d755f

baching -> batching

mattjhayes3 and others added 9 commits December 14, 2024 05:00

Revert "Initial non-interactive implementation for GATConv"

45bcbf2

This reverts commit eef03e5.

Fix Docstring Typos for LargeGraphIndexer (pyg-team#9837)

8cf1629

Fix some issues with the docstrings of LargeGraphIndexer. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zachary Aristei <[email protected]>

feat: store reverse mapping within EdgeTypeStr (pyg-team#9844)

575825f

To avoid issues when node types contain the `EDGE_TYPE_STR_SPLIT` delimiter. --------- Co-authored-by: rusty1s <[email protected]>

minor cleanups

78426fa

checkin gat opt

e20f018

Merge branch 'master' into kge

1c4e33d

Revert "checkin gat opt"

f38aa94

This reverts commit e20f018.

changelog

d33dd69

documentation fixes

a65235c

mattjhayes3 marked this pull request as ready for review December 15, 2024 01:41

mattjhayes3 requested review from wsad1 and EdisonLeeeee as code owners December 15, 2024 01:41

mattjhayes3 changed the base branch from master to multigpu-cleanup December 15, 2024 01:49

mattjhayes3 requested review from a team, JiaxuanYou, rusty1s, mananshah99, RexYing, RBendias and dufourc1 as code owners December 15, 2024 01:49

mattjhayes3 changed the base branch from multigpu-cleanup to master December 15, 2024 01:49

mattjhayes3 closed this Dec 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CS224W - Adds TransD KGE, and Bernoulli corruption strategy for all KGE #9864

CS224W - Adds TransD KGE, and Bernoulli corruption strategy for all KGE #9864

mattjhayes3 commented Dec 15, 2024 •

edited

Loading

mattjhayes3 commented Dec 15, 2024 •

edited

Loading

CS224W - Adds TransD KGE, and Bernoulli corruption strategy for all KGE #9864

CS224W - Adds TransD KGE, and Bernoulli corruption strategy for all KGE #9864

Conversation

mattjhayes3 commented Dec 15, 2024 • edited Loading

Details

Benchmarks

mattjhayes3 commented Dec 15, 2024 • edited Loading

mattjhayes3 commented Dec 15, 2024 •

edited

Loading

mattjhayes3 commented Dec 15, 2024 •

edited

Loading