Resources and tools for bioinformatics study. Please contribute this document if you have new recommendations or some commments on current collections Some small tools are also provided under ./small_tools/, such as file merge, gene list merge, gene list comparison, etc. Created, Oct 6, 2011 Latest update, Oct 7, 2011 *************************************************************************** --------------------------------------------------------------------------- *************************************************************************** 1. Data resources 1.1 Protein-protein interaction HPRD (recommended for initial study; download the binary interaction file if you do not concern the quality; pay attention to the evidence codes, if you want to focus on the PPI detected by low-throughput experiments) BioGRID MINT IntAct MIPS BIND (Note: some protein-DNA interactions are also included) STRING (Note: they are many functional associations rather than physical interactions) 1.2 Pathways KEGG (You can download raw data files from its FTP site. Please look at the map_title.tab and hsa_gene_map.tab for GeneID mapping) BioCarta (No source file... Only include the core genes related to the signaling pathway) Reactome (Many more species avaliable) NetPath NCBI PID 1.3 Gene regulations TargetScan (predicted miRNA targets; tool is also avaliable) RNA22 (miRNA target prediction tool) miRBase (microRNA database) TarBase (experimentally verified microRNA target database) miR2Disease (a manually curated database, aims at providing a comprehensive resource of miRNA deregulation in various human diseases) TRED (transcription regulation database; curated from literature) TRANSFAC (TFBS PWMs; many useful information; Please pay attention to the quality of the PWMs;need license.....) JASPAR (Open-access PWMs) ENCODE (Huge number of data....) 1.4 Gene function Gene Ontology (A review paper is recommended: Rhee et al. Use and misuse of the gene ontology annotations. Nat Rev Genet 2008, 9:509-515.) NCBI Gene Database 1.5 Gene expression NCI-60 project (gene & miRNA expressions from 60 cancer cell lines) Connectivity Map (gene expressions from many cell lines treated by different drugs under different dosages) NCBI GEO (you can download .CEL raw data for further processing) EBI ArrayExpress (I do not like the file format of ArrayExpress....) ENCODE (many resources including RNA-seq data) 1.6 Drug related DrugBank PubChem ATC code SIDER 1.7 Disease related HPO SIDER OMIM miR2Disease (a manually curated database, aims at providing a comprehensive resource of miRNA deregulation in various human diseases) 1.8 Standard vocabulary HNGC (mapping many IDs/Names to standard IDs; EntrezGeneID is recommended;http://www.genenames.org;a sample code is given ./id_mapper/) UMLS (Unified Medical Language System) MeSH (Medical Subject Headings) *************************************************************************** --------------------------------------------------------------------------- *************************************************************************** 2 Tools 2.1 Integrated portals and platforms UCSC Genome Browser (please learn how to use Table Browser and how to add "Custom Track") IPA (Read documents from http://www.ingenuity.com/; a commercial integrated functional annotating systems) Expander Cytoscape (Network visualization and small scale network analysis) BioConductor (a R platform, including many packages for bioinformatics analysis; please read its documents) 2.2 Sequence analysis BLAST BLAT (compare similar and long sequence) Bowtie (recommended for deep sequencing analysis) ClustalX (local multiple alignment) Sim4 2.3 Literature mining Literature mining scripts written by Jun Yuan (Please refer to the dir ./literature_mining/) 2.4 Statistical packages fdrtool (calculate the q-values based on p-value, z-score, t-score and correlation) 2.5 Functional annotation or gene set analysis GSEA (gene set enrichment analysis package) DAVID web tools Ontologizer (gene set analysis for GOs with hierarchical information and visualization) 2.6 Gene regulation TargetScan (miRNA target prediction) RNA22 (miRNA target prediction;easy for use) DME/STORM (motif analysis package first written by Andrew Smith, recommended; many other tools in the same package) MEME (for small scale motif analysis, very slow) 2.7 Microarray processing dChip (easy to use; please refer to the documents and some scripts under ./microarray_dchip/; Combat for adjusting batch effects) RMA (similar usage as dChip) SAM (Significance analysis of microarray, to detect differentially expressed genes; EXCEL plugin/R scripts; I recommend write your own code (t-test + fdr adjustment) to identify differentially expressed genes...) EDGE (Identify differentially expressed genes in time-course datasets; the sample size should be more than 10 according to my experience) STEM (Identify gene expression patterns from time-course datasets with limited number of time points; easy to use, java platform) *************************************************************************** --------------------------------------------------------------------------- *************************************************************************** 3 Conferences and Journals 3.1.1 Bioinformatics 3.1.2 PLoS Computational Biology 3.1.3 BMC Bioinformatics/Genomics/Systems Biology 3.1.4 PLoS ONE 3.1.5 Nucleic Acids Research (Computational Biology/Webserver Issue/Database Issue) 3.1.6 Nature Biotechnology/Method (Computational Biolgy) 3.2.1 ISMB/ECCB (Intelligent Systems for Molecular Biology) *** 3.2.2 RECOMB (Research in Computational Molecular Biology) *** 3.2.3 APBC (Asia Pacific Bioinformatics Conference) ** 3.2.3 InCOB (International Conference on Bioinformatics) ** 3.2.4 PSB (Pacific Symposium on Biocomputing)