清华合成与系统生物学中心
Tsinghua Center for Synthetic and Systems Biology
Oct 2013

DNA copy number aberrations in cancer and evidence-based text mining for cancer

Time: 10:00-12:00 AM, Oct 24, Thu
Place: FIT 1-312
Speaker: Prof. Hyunju Lee, Ph.D., School of Info. & Comm., GIST, South Korea
ABSTRACT: This talk consists of two main parts. First, integrative approaches for the analysis of DNA copy number aberrations (CNAs), gene expressions, protein-protein interactions in cancer are introduced. To find cancer-related genes and pathways, we developed a voting-based cancer module identification method by combining topological and data-driven properties, and a wavelet-based method to distinguish cancer-driving genes from passenger genes. In the second part of the talk, a disease gene search engine, DigSee, is introduced. DigSee is a Web service to search MEDLINE abstracts for evidence sentences describing that 'genes' are involved in the development of 'cancer' through 'biological event. DigSee is available through http://gcancer.org/digsee.

Differential Gene Expression Analysis Using Coexpression and RNA-Seq Data

Time: 10:00-12:00 AM, Oct 24, Thu
Place: FIT 1-312
Speaker: Prof. Tao Jiang, Ph.D., Department of Computer Science and Engineering, University of California, Riverside and Tsinghua University, Beijing
ABSTRACT: As a fundamental tool for discovering genes involved in a disease or biological process, differential gene expression analysis plays an important role in genomics research. High throughput sequencing technologies such as RNA-Seq are increasingly being used for differential gene expression analysis which was dominated by the microarray technology in the past decade. However, inferring differential gene expression based on the observed difference of RNA-Seq read counts has unique challenges that were not present in microarray-based analysis. The differential expression estimation may be biased against low read count values such that the differential expression of genes with high read counts is more easily detected. The estimation bias may further propagate in downstream analyses at the systems biology level if it is not corrected. In this work, we propose a new efficient algorithm for detecting differentially expressed genes based on a markov random field (MRF) model, called MRFSeq, that uses additional coexpression data to enhance the prediction power. Our main technical contribution is a careful construction of the clique potential functions in the MRF so its maximum a posteriori (MAP) estimation can be reduced to the well-known maximum flow problem and thus solved in polynomial time. Our extensive experiments on simulated and real RNA-Seq datasets demonstrate that MRFSeq is more accurate and less biased against genes with low read counts than the existing methods based on RNA-Seq data alone. For example, on the well-studied MAQC dataset, MRFSeq improved the sensitivity from 11.6% to 38.8% for genes with low read counts.
This is joint work with Ei-Wen Yang and Thomas Girke, both at UC Riverside.