#### DMCA

## Towards an Efficient Discovery of the Topological Representative Subgraphs

### Citations

2226 |
Finding Groups in Data An Introduction to Cluster Analysis
- Kaufman, Rousseeuw
- 1990
(Show Context)
Citation Context ...considered as beneficial, otherwise it is ignored. The assignment and swap steps are iteratively performed until no change. Many implementations of k-medoids have been proposed in the literature. PAM =-=[18]-=- is a pioneer implementation of k-medoids. Later, two other implementations have been proposed which are CLARAN [18] and CLARANS [16]. The main difference between these implementations is in the way o... |

1385 | The Protein Data Bank
- BERMAN, WESTBROOK, et al.
- 2000
(Show Context)
Citation Context ...ive and negative samples. Positive proteins are sampled from a selected protein family, namely G-proteins and C1 set domains, whereas negative proteins are randomly sampled from the Protein Data Bank =-=[21]-=-. The two other datasets are used to evaluate the runtime and the distribution of subgraphs according to their sizes. The dataset of Enzymes, previously used in [22] and [23], is composed of 664 prote... |

643 | gSpan: Graph-based substructure pattern mining
- Yan, Han
- 2002
(Show Context)
Citation Context ... a threshold distance δ. We use δ = 7Å. Formally: e(u, v) = { 1, if ∆(Cα(u), Cα(v)) ≤ δ 0, otherwise Frequent subgraph mining: We use the state-of-the-art method of frequent subgraph discovery gSpan =-=[24]-=- to find the frequent subgraphs in each dataset. We tried different minimum frequency threshold in order to obtain a reasonable number of frequent subgraphs from each dataset. The retained minimum fre... |

534 | Graphs over time: densification laws, shrinking diameters and possible explanations
- Leskovec, Kleinberg, et al.
- 2005
(Show Context)
Citation Context ...tructural similarities would be detected with respect to the soundness of results. Considering topological properties instead of exact or approximate structural isomorphism was inspired by works like =-=[9, 10, 11, 12, 13, 14, 15]-=- where authors showed the importance and efficiency of topological attributes in describing graph data. For instance, in [9], authors proposed a classification framework based on the assumption that g... |

291 | Data clustering: 50 years beyond k-means
- Jain
- 2010
(Show Context)
Citation Context ...g Here, we discuss the second part of our selection approach which is the clustering step. We use k-medoids [16] which is a well known clustering algorithm that is widely used in unsupervised learning=-=[17]-=-. It takes as input a set of objects Ω and a number of clusters k, and gives as output the k cluster centers (called medoids) that we consider as the representative objects. To do so, k-medoids needs ... |

252 | CloseGraph: Mining closed frequent graph patterns
- Yan, Han
- 2003
(Show Context)
Citation Context ...high. As structural similarity represents one major cause of redundancy in frequent subgraphs, many works have been proposed for subgraph selection based on exact or approximate structural similarity =-=[1, 2, 3, 4]-=-. Two pioneer works that fall in this type are [1] and [2]. If a graph is contained in another one then the small graph is called the subgraph and the big one is called the super graph. In [1], a freq... |

142 | CLARANS: A method for clustering objects for spatial data mining
- Ng, Han
- 2002
(Show Context)
Citation Context ...ting to consider the number of loops in each subgraph as feature. 2.3.2. K-medoids clustering Here, we discuss the second part of our selection approach which is the clustering step. We use k-medoids =-=[16]-=- which is a well known clustering algorithm that is widely used in unsupervised learning[17]. It takes as input a set of objects Ω and a number of clusters k, and gives as output the k cluster centers... |

69 | Mining significant graph patterns by leap search
- Yan, Cheng, et al.
- 2008
(Show Context)
Citation Context ...| Moy.|E| | Ω | G-proteins 66 246 971 114792 C1 set domains 76 238 928 258371 Enzymes 664 358 910 253404 AIDS antiviral screen 43850 28 30 6749 The first two datasets were previously used in [19] and =-=[20]-=-. Both datasets will be used to evaluate the quality of the selected subgraphs. In fact, each dataset is composed of two groups of protein 3D-structures equally divided between positive and negative s... |

51 |
Mode-seeking by medoidshifts
- Sheikh, Khan, et al.
- 2007
(Show Context)
Citation Context ...e to define a specific number of clusters. A promising future direction could be to remove the k constraint. This can be simply done using a parameter free clustering algorithm such as Medoids-shifts =-=[25]-=-. As in real-world application the number of subgraphs can be exponential, it would be also interesting to make the approach even more scalable using a parallel clustering algorithm such as CAPEK [26]... |

35 | Distinguishing enzyme structures from non-enzymes without alignments
- DOBSON, DOIG
- 2003
(Show Context)
Citation Context ...sampled from the Protein Data Bank [21]. The two other datasets are used to evaluate the runtime and the distribution of subgraphs according to their sizes. The dataset of Enzymes, previously used in =-=[22]-=- and [23], is composed of 664 proteins. The last dataset shows a set 11 of antiviral screen data (AIDS). It contains the activity test information of 43850 chemical compounds. This dataset was previou... |

24 | Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: A one-to-one correspondence and mining algorithms.
- Li, Liu, et al.
- 2007
(Show Context)
Citation Context ...ergraph. In both cases, only the closed or maximal subgraphs are maintained and the rest of frequent subgraphs are removed. Many works have been proposed based on closed and maximal subgraphs such as =-=[5, 6]-=-. Although the set of closed or maximal subgraphs is much smaller than the set of frequent ones, the number of subgraphs is still very high in real-world cases. Another fresh work for subgraph selecti... |

23 | Margin:Maximal frequent subgraph mining.
- Thomas, Valluri, et al.
- 2006
(Show Context)
Citation Context ...high. As structural similarity represents one major cause of redundancy in frequent subgraphs, many works have been proposed for subgraph selection based on exact or approximate structural similarity =-=[1, 2, 3, 4]-=-. Two pioneer works that fall in this type are [1] and [2]. If a graph is contained in another one then the small graph is called the subgraph and the big one is called the super graph. In [1], a freq... |

14 | ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns
- Chaoji, Hasan, et al.
- 2008
(Show Context)
Citation Context ...high. As structural similarity represents one major cause of redundancy in frequent subgraphs, many works have been proposed for subgraph selection based on exact or approximate structural similarity =-=[1, 2, 3, 4]-=-. Two pioneer works that fall in this type are [1] and [2]. If a graph is contained in another one then the small graph is called the subgraph and the big one is called the super graph. In [1], a freq... |

11 | On effective presentation of graph patterns: A structural representative approach
- Chen, Lin, et al.
- 2008
(Show Context)
Citation Context |

7 |
Quantification of tissue sections: Graph theory and topology as modelling tools
- Rodenacker, Bischoff
- 1990
(Show Context)
Citation Context ...tructural similarities would be detected with respect to the soundness of results. Considering topological properties instead of exact or approximate structural isomorphism was inspired by works like =-=[9, 10, 11, 12, 13, 14, 15]-=- where authors showed the importance and efficiency of topological attributes in describing graph data. For instance, in [9], authors proposed a classification framework based on the assumption that g... |

7 | Boosting with structure information in the functional space: an application to graph classification
- Fei, Huan
(Show Context)
Citation Context ...G| Moy.|V| Moy.|E| | Ω | G-proteins 66 246 971 114792 C1 set domains 76 238 928 258371 Enzymes 664 358 910 253404 AIDS antiviral screen 43850 28 30 6749 The first two datasets were previously used in =-=[19]-=- and [20]. Both datasets will be used to evaluate the quality of the selected subgraphs. In fact, each dataset is composed of two groups of protein 3D-structures equally divided between positive and n... |

7 | Clustering Performance Data Efficiently at Massive Scales. - Gamblin, Supinski, et al. - 2010 |

6 | melting, large graphs by edge manipulation
- Tong, Prakash, et al.
- 2012
(Show Context)
Citation Context ...tructural similarities would be detected with respect to the soundness of results. Considering topological properties instead of exact or approximate structural isomorphism was inspired by works like =-=[9, 10, 11, 12, 13, 14, 15]-=- where authors showed the importance and efficiency of topological attributes in describing graph data. For instance, in [9], authors proposed a classification framework based on the assumption that g... |

6 |
A novel method for comparing topological models of protein structures enhanced with ligand information
- Veeramalai, Gilbert
- 2008
(Show Context)
Citation Context |

5 | Effective graph classification based on topological and label attributes
- Li, Semerci, et al.
(Show Context)
Citation Context |

5 | Discriminative frequent subgraph mining with optimality guarantees, Stat Anal Data Mining 3(5
- Thoma, Cheng, et al.
- 2010
(Show Context)
Citation Context ...rom the Protein Data Bank [21]. The two other datasets are used to evaluate the runtime and the distribution of subgraphs according to their sizes. The dataset of Enzymes, previously used in [22] and =-=[23]-=-, is composed of 664 proteins. The last dataset shows a set 11 of antiviral screen data (AIDS). It contains the activity test information of 43850 chemical compounds. This dataset was previously used ... |

4 |
Efficiently mining δ-tolerance closed frequent subgraphs
- Takigawa, Mamitsuka
- 2011
(Show Context)
Citation Context ...ergraph. In both cases, only the closed or maximal subgraphs are maintained and the rest of frequent subgraphs are removed. Many works have been proposed based on closed and maximal subgraphs such as =-=[5, 6]-=-. Although the set of closed or maximal subgraphs is much smaller than the set of frequent ones, the number of subgraphs is still very high in real-world cases. Another fresh work for subgraph selecti... |

4 |
Feature Selection on Node Statistics Based Embedding of Graphs
- Gibert, Valveny, et al.
- 2012
(Show Context)
Citation Context |

3 |
The Maximum Common Subgraph Problem: Faster Solutions via Vertex
- Abu-Khzam, Samatova, et al.
- 2007
(Show Context)
Citation Context ...n [3], authors proposed an approach for 2 subgraphs extraction and selection. For selection, the structural similarity between two subgraphs is measured by how much does their maximum common subgraph =-=[8]-=- represents from their overall structure. A very close work is [4], where authors proposed an approach for mining a set of structural representative subgraphs among the frequent ones. They adopted a t... |

2 |
Mephu Nguifo, Smoothing 3D protein structure motifs through graph mining and amino-acids similarities
- Dhifli, Saidi, et al.
- 2013
(Show Context)
Citation Context ...aximal subgraphs is much smaller than the set of frequent ones, the number of subgraphs is still very high in real-world cases. Another fresh work for subgraph selection based on exact isomorphism is =-=[7]-=-. In this work, authors tried to select the called representative unsubstituted subgraphs using a similarity function that exploits the prior domain knowledge. A fundamental constraint in their select... |

1 | Indexing and mining topological patterns for drug discovery
- Ranu, Singh
(Show Context)
Citation Context |