|
Deciphering the genetic basis of
human diseases is an important goal of biomedical research. On the
basis of the assumption that similar diseases are caused by
functionally related genes, we propose a computational framework
that integrates human protein-protein interactions, disease
phenotype similarities, and known gene-phenotype associations in
order to capture the complex relationships between phenotypes and
genotypes. We develop a tool named CIPHER to predict and prioritize
disease genes, and we show that the global concordance between the
human protein network and the phenotype network reliably predicts
disease genes. Our method is applicable to genetically
uncharacterized phenotypes, effective in the genome-wide scan of
disease genes, and also extendable to explore gene cooperativity in
complex diseases. The predicted genetic landscape of over 1000 human
phenotypes reveals the modular organization of phenotype-genotype
relationships and contributes to the understanding of how genotype
may determine phenotype. The genome-wide prioritization of candidate
genes for over 5000 human phenotypes, including those with
under-characterized disease loci or even those lacking known
association, is publicly released to facilitate future discovery of
disease genes.
This website hosts two
versions of the predicted genetic landscape of human disease. One is
predicted using a highly reliable protein interaction network,
HPRD. The other is predicted on
an extended network combining several high quality protein
interaction databases. The results based on HPRD network may be more reliable, while the one
based on
the extended network may be more powerful in discovering novel
disease genes. The top
1000 genes for each phenotype can be explored
here. We are
going to update this website to make it more user-friendly for biologists.
We will integrative gene annotations, map the scores onto the
genome, visualize the results on protein interaction network,
provide tools for retrieving of specific disease or gene, etc.
|
|
The predicted
genetic landscape of human disease |
| Connecting 5080 human phenotypes
with 14433 human genes |
|
|
|
Data |
Description |
Format |
Download |
|
Genome-wide
prioritization results** |
based on the HPRD network |
Each
column of 'rank' is a list of ranked genes for one phenotype, with their
scores in corresponding column in 'score' matrix. For example, for
results based on the extended network, column
243 is breast cancer [MIM #114480], the first three
elements of this column are the IDs of the top 3 ranked genes : 102,
1370 and 1502, representing BRCA1, ATM and MDC1, respectively. Their
scores can be found in the first three elements of column 243 in
'score': 0.3600, 0.3380 and 0.3348. Phenotypes are placed in the same
order in the disease identifier file.
-----------
*_all: result for all genes
*_top1000: result for top 1000 genes |
MAT
file
(MATLAB 7.0.1) |
score_all 277Mb
score_top1000 32Mb
rank_all 80Mb rank_top1000
8Mb |
|
based on the
extended network |
score_all 574Mb
score_top1000 31Mb
rank_all 135Mb
rank_top1000 9Mb |
|
Protein
interaction network |
the HPRD network |
34364 non-redundant
protein-protein interactions between 8919 human proteins. Downloaded
from the HPRD database in May 2007. |
Tab-delimited |
 |
|
the extended
network |
72431 non-redundant
protein-protein interactions between 14433 human proteins. Assembled
from the HPRD,
BIND,
MINT,
OPHID databases. |
 |
|
Gene-phenotype
network |
the HPRD network |
These two
files contain the gene-phenotype associations used in this study. The
original associations are extracted from http://www.biomart.org/. After
automatic gene identifier mapping, they are arranged in the following
format:
----------------------
<MIM ID><tab><Gene ID 1><tab><Gene ID 2>.....
For example:
--------------------------------------------------------
254200 249 2588 253 3642
---------------------------------------------------------
indicates that the phenotype 'MIM:254200' has four known disease genes:
249, 2588,253 and 3642. |
 |
|
the extended network |
 |
|
Gene Identifier |
for the HPRD network |
Gene IDs,
five columns: Gene ID used in 'score' and 'rank'<tab>Gene ID in HPRD(or
extended)<tab>UniProt/Swiss-Prot Accession<tab>RefSeq Peptide ID<tab>HGNC
symbol. for example: 1 02995
Q13485 NP_005350.1 SMAD4 |
 |
|
the
extended network |
 |
|
Disease Identifier
for both networks |
Each row is a MIM
ID. |
 |
|
phenotype network |
phenotype
similarity scores for 5080 MIM phenotypes used in this study. Please
contact
Dr Han G Brunner for the dataset. |
|
|