We consider graphs generated by the stochastic block model (Wang and Wong, 1987).
The specific generation of graphs is described by the following process:
Let
- Let
$\widehat{G}$ be the empty graph. - For each
$v \in G$ , add a set of$c$ vertices$v_1,...,v_c$ to$\widehat{G}$ . - For each
$v \in G$ , connect pairs${v_i,v_j} \in V(\widehat{G})$ by an edge with probability$p$ . - For all pairs
${v_i,u_j} \in V(\widehat{G})$ with${v,u} \in E(G)$ , connect them by an edge with probability$p$ , as well. - Connect a number of
$m_x$ prior unconnected vertex pairs${v_i,u_j}$ in$\widehat{G}$ as noise edges.
The resulting classification task is to assign the generated graph
All datasets considered in the evaluation of (Schulz et al., 2022) were created starting with a random tree of size
- Yuchung J. Wang and George Y. Wong (1987): Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397), 8–19.
- Till Schulz, Tamas Horvath, Pascal Welke, Stefan Wrobel (2022): A Generalized Weisfeiler-Lehman Graph Kernel. Machine Learning. https://doi.org/10.1007/s10994-022-06131-w
The file in this repository requires the networkx
library and can then be used as a standalone script.
The script produces output in the file format that is used by the TUDataset.
Use it on the command line with the following arguments
Blockgraph Dataset Generator
optional arguments:
-h, --help show this help message and exit
--name NAME Name of output file(s)
--N N Number of graphs per class
--n N Number of blocks
--c C Size of blocks
--p P Edge probability
--m M Number of noise edges
--seed SEED Random seed
If you use this generator in any of your work, please cite our article:
Till Schulz, Tamas Horvath, Pascal Welke, Stefan Wrobel: A generalized Weisfeiler-Lehman graph kernel. Mach Learn (2022). https://doi.org/10.1007/s10994-022-06131-w