You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.txt 2.5KB

4 years ago
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
  1. README for dataset ENZYMES
  2. === Usage ===
  3. This folder contains the following comma separated text files
  4. (replace DS by the name of the dataset):
  5. n = total number of nodes
  6. m = total number of edges
  7. N = number of graphs
  8. (1) DS_A.txt (m lines)
  9. sparse (block diagonal) adjacency matrix for all graphs,
  10. each line corresponds to (row, col) resp. (node_id, node_id)
  11. (2) DS_graph_indicator.txt (n lines)
  12. column vector of graph identifiers for all nodes of all graphs,
  13. the value in the i-th line is the graph_id of the node with node_id i
  14. (3) DS_graph_labels.txt (N lines)
  15. class labels for all graphs in the dataset,
  16. the value in the i-th line is the class label of the graph with graph_id i
  17. (4) DS_node_labels.txt (n lines)
  18. column vector of node labels,
  19. the value in the i-th line corresponds to the node with node_id i
  20. There are OPTIONAL files if the respective information is available:
  21. (5) DS_edge_labels.txt (m lines; same size as DS_A_sparse.txt)
  22. labels for the edges in DS_A_sparse.txt
  23. (6) DS_edge_attributes.txt (m lines; same size as DS_A.txt)
  24. attributes for the edges in DS_A.txt
  25. (7) DS_node_attributes.txt (n lines)
  26. matrix of node attributes,
  27. the comma seperated values in the i-th line is the attribute vector of the node with node_id i
  28. (8) DS_graph_attributes.txt (N lines)
  29. regression values for all graphs in the dataset,
  30. the value in the i-th line is the attribute of the graph with graph_id i
  31. === Description ===
  32. ENZYMES is a dataset of protein tertiary structures obtained from (Borgwardt et al., 2005)
  33. consisting of 600 enzymes from the BRENDA enzyme database (Schomburg et al., 2004).
  34. In this case the task is to correctly assign each enzyme to one of the 6 EC top-level
  35. classes.
  36. === Previous Use of the Dataset ===
  37. Feragen, A., Kasenburg, N., Petersen, J., de Bruijne, M., Borgwardt, K.M.: Scalable
  38. kernels for graphs with continuous attributes. In: C.J.C. Burges, L. Bottou, Z. Ghahra-
  39. mani, K.Q. Weinberger (eds.) NIPS, pp. 216-224 (2013)
  40. Neumann, M., Garnett R., Bauckhage Ch., Kersting K.: Propagation Kernels: Efficient Graph
  41. Kernels from Propagated Information. Under review at MLJ.
  42. === References ===
  43. K. M. Borgwardt, C. S. Ong, S. Schoenauer, S. V. N. Vishwanathan, A. J. Smola, and H. P.
  44. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(Suppl 1):i47–i56,
  45. Jun 2005.
  46. I. Schomburg, A. Chang, C. Ebeling, M. Gremse, C. Heldt, G. Huhn, and D. Schomburg. Brenda,
  47. the enzyme database: updates and major new developments. Nucleic Acids Research, 32D:431–433, 2004.