This repository is the official PyTorch implementation of GraphRNN, a graph generative model using auto-regressive model.
Jiaxuan You*, Rex Ying*, Xiang Ren, William L. Hamilton, Jure Leskovec, GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Model (ICML 2018)
Install PyTorch following the instuctions on the official website. The code has been tested over PyTorch 0.2.0 and 0.4.0 versions.
conda install pytorch torchvision cuda90 -c pytorch
Then install the other dependencies.
pip install -r requirements.txt
For the GraphRNN model:
main.py is the main executable file, and specific arguments are set in
train.py includes training iterations and calls
create_graphs.py is where we prepare target graph datasets.
For baseline models:
https://github.com/snap-stanford/snap/tree/master/examples/krongen(for generating Kronecker graphs), and
https://github.com/snap-stanford/snap/tree/master/examples/kronfit(for learning parameters for the model).
To adjust the hyper-parameter and input arguments to the model, modify the fields of
args.cuda controls which GPU is used to train the model, and
specifies which dataset is used to train the generative model. See the documentation in
for more detailed descriptions of all fields.
There are several different types of outputs, each saved into a different directory under a path prefix. The path prefix is set at
args.dir_input. Suppose that this field is set to
./graphscontains the pickle files of training, test and generated graphs. Each contains a list of networkx object.
./eval_resultscontains the evaluation of MMD scores in txt format.
./model_savestores the model checkpoints
./nllsaves the log-likelihood for generated graphs as sequences.
./figuresis used to save visualizations (see Visualization of graphs section).
The evaluation is done in
evaluate.py, where user can choose which settings to evaluate.
To evaluate how close the generated graphs are to the ground truth set, we use MMD (maximum mean discrepancy) to calculate the divergence between two sets of distributions related to
the ground truth and generated graphs.
Three types of distributions are chosen: degree distribution, clustering coefficient distribution.
Both of which are implemented in
eval/stats.py, using multiprocessing python
module. One can easily extend the evaluation to compute MMD for other distribution of graphs.
We also compute the orbit counts for each graph, represented as a high-dimensional data point. We then compute the MMD
between the two sets of sampled points using ORCA (see http://www.biolab.si/supp/orca/orca.html) at
One first needs to compile ORCA by
g++ -O2 -std=c++11 -o orca orca.cpp`
(the binary file already in repo works in Ubuntu).
To evaluate, run
Arguments specific to evaluation is specified in class
evaluate.Args_evaluate. Note that the field
Args_evaluate.dataset_name_all must only contain
datasets that are already trained, by setting args.graph_type to each of the datasets and running
The training, testing and generated graphs are saved at ‘graphs/’.
One can visualize the generated graph using the function
utils.load_graph_list, which loads the
list of graphs from the pickle file, and
util.draw_graph_list, which plots the graph using
Jesse Bettencourt and Harris Chan have made a great slide introducing GraphRNN in Prof. David Duvenaud’s seminar course Learning Discrete Latent Structure.