Department of Computer Science
Technical Reports in Bioinformatics
To appear in Bioinformatics 2004. In press.
Motivation: Understanding principles of cellular organization and function can be enhanced if we detect known and predict still undiscovered protein complexes within the cell's protein-protein interaction (PPI) network. Such predictions may be used as an inexpensive tool to direct biological experiments. The increasing amount of available PPI data necessitates an accurate and scalable approach to protein complex identification.
Results: We have developed the Restricted Neighbourhood Search Clustering Algorithm (RNSC) to efficiently partition networks into clusters using a cost function. We applied this cost-based clustering algorithm to PPI networks of S. cerevisiae, D. melanogaster, and C. elegans to identify and predict protein complexes. We have determined functional and graph-theoretic properties of true protein complexes from the MIPS database. Based on these properties we defined filters to distinguish between identified network clusters and true protein complexes.
Conclusions: Our application of the cost-based clustering algorithm provides an accurate and scalable method of detecting and predicting protein complexes within a PPI network.
To appear in Knowledge Discovery in Proteomics, Jurisica, I. & Wigle, D., eds., CRC Press.
Networks have been used to model many real-world phenomena to better understand the phenomena and to guide experiments in order to predict their behavior. Since incorrect models lead to incorrect predictions, it is vital to have a correct model. As a result, new techniques and models for analyzing and modeling real-world networks have recently been introduced. One example of large and complex networks involves protein-protein interaction (PPI) networks. We demonstrate that the currently popular scale-free model of PPI networks fails to fit the data in several respects. We show that a random geometric model provides a much more accurate model of the PPI data.