You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
JiaxuanYou 1ef475d957 first commit 6 years ago
..
PROTEINS_full_A.txt first commit 6 years ago
PROTEINS_full_graph_indicator.txt first commit 6 years ago
PROTEINS_full_graph_labels.txt first commit 6 years ago
PROTEINS_full_node_attributes.txt first commit 6 years ago
PROTEINS_full_node_labels.txt first commit 6 years ago
README.txt first commit 6 years ago

README.txt

README for dataset PROTEINS_full


=== Usage ===

This folder contains the following comma separated text files
(replace DS by the name of the dataset):

n = total number of nodes
m = total number of edges
N = number of graphs

(1) DS_A.txt (m lines)
sparse (block diagonal) adjacency matrix for all graphs,
each line corresponds to (row, col) resp. (node_id, node_id)

(2) DS_graph_indicator.txt (n lines)
column vector of graph identifiers for all nodes of all graphs,
the value in the i-th line is the graph_id of the node with node_id i

(3) DS_graph_labels.txt (N lines)
class labels for all graphs in the dataset,
the value in the i-th line is the class label of the graph with graph_id i

(4) DS_node_labels.txt (n lines)
column vector of node labels,
the value in the i-th line corresponds to the node with node_id i

There are OPTIONAL files if the respective information is available:

(5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt)
labels for the edges in DS_A_sparse.txt

(6) DS_edge_attributes.txt (m lines; same size as DS_A.txt)
attributes for the edges in DS_A.txt

(7) DS_node_attributes.txt (n lines)
matrix of node attributes,
the comma seperated values in the i-th line is the attribute vector of the node with node_id i

(8) DS_graph_attributes.txt (N lines)
regression values for all graphs in the dataset,
the value in the i-th line is the attribute of the graph with graph_id i


=== Previous Use of the Dataset ===

Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph
Kernels from Propagated Information. Under review at MLJ.


=== References ===

K. M. Borgwardt, C. S. Ong, S. Schoenauer, S. V. N. Vishwanathan, A. J. Smola, and H. P.
Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(Suppl 1):i47–i56,
Jun 2005.

P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without
alignments. J. Mol. Biol., 330(4):771–783, Jul 2003.