Homepage Used Data Predicted Results Online Search Contact us

        The identification of genes involved in human complex diseases, called disease genes, remains a challenge problem in computational system biology. Molecular networks and many data sources (also described as networks) have been developed as more comprehensive understandings of a cell and can be served as sensors and drivers of human diseases. In the molecular level, the
disease states can be considered emergent properties of molecular networks, as opposed to the core biological processes associated with a disease being driven by responses to changes in a small number of genes. This modular nature relationships imply a positive correlation between gene-gene relatedness and disease-disease similarity that similar diseases usually show their causal genes related, meanwhile functional similar genes may highly involved in phenotype similar diseases. To bridge this correlation, we build a multi regression model to explain disease similarities by combining multi gene closeness in individual data source. Served as an effective information fusion method, our method can automatically design the relative contribution among different data sources and prioritize candidate genes without any training genes needed.

        BRIDGE, ‘Based on Regression to Identify Disease Genes’, is a flexible and interactive software to prioritize candidates genes by using five data sources, including five data sources are used including Protein- protein interaction (PPI), Gene sequences similarity (GS), Gene expression data (GE), Pathway database (KEGG) and Gene ontology database (GO). First, BRIDGE connects and weights the genes as networks in each data source and also constructs a disease phenotype networks similarly. Second, it then
assembles human disease network, disease-gene information, and multi data sources (networks) into a single network based on regression model. The model fitness (R square) is calculated and assigned as the concordance score for each candidate gene. Finally, all candidates are ranked on the basis of the scores received. All used data can be found here.

        The method has been used to infer genome-wide molecular basis for all diseases to chart a genetic landscape of human diseases. The R square is calculated between all the 1126 diseases and 8919 genes within regression model and used to rank all candidates genes to each disease. By clustering this 1126*8919 matrix, more
relationships among genes and diseases are found. More analysis on obesity and diabetes show that most predicted genes ranked in top 100 are disease genes. The predicted results about 1126 diseases on 8919 genes can be viewed from kindly web interface. Additionally, it has the flexibility of including additional data sources and can be further used in prioritizing candidate genes underlying biological processes and inferring gene function modules.

©2010. All Rights Reserved
Beijing, 100084, P.R.CHINA