Statistical Methods with Applications

Statistics is a mathematical science pertaining to the collection, presentation, analysis, and interpretation of data. It is applicable to a wide variety of academic disciplines. In this course, we will focus on the principles of basic statistical inference methods and their real-life applications. We will introduce how to use statistical methods to summarize or describe a collection of data, and how to infer patterns implied in the data in a way that accounts for randomness and uncertainty in the observations.

The purpose of this course is to build applied statistics from the first principles of probability theory. Starting from the basics of probabilities, the properties of random variables, and the common families of distributions, we shall develop the methods of statistical inference — including point estimation, hypothesis testing, interval estimation, analysis of variance(ANOVA), and various regression models — using definitions, examples, techniques, and concepts that are statistical and are natural extensions and consequences of previous concepts.

Throughout the course, we shall use R as the statistical computing platform to illustrate statistical methods and associated examples, and to demonstrate contemporary applications in bioinformatics.

Reference books
  1. G. Casella and R. L. Berger. Statistical Inference, Thomson Learning, California, 2002.
  2. P. Dalgaard. Introductory Statistics with R, Springer, New York, 2002.
  3. W. N. Venables and B. D. Ripley, Moden Applied Statistics with S, Springer, New York, 2002.

Design and Analysis of Bioinformatics Algorithms

This course is a survey of algorithms and methods in computational biology and bioinformatics. The principles of algorithmic design for biological datasets are studied and existing algorithms analyzed for application to real datasets. Topics covered include: biological sequence analysis, gene identification, regulatory motif discovery, genome assembly, genome duplication and rearrangements, evolutionary theory, clustering algorithms, biological network analysis, and network motif identification.

Reference books
  1. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.
  2. D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997.
  3. N. C. Jones and P. A. Pevzner. An Introduction to Bioinformatics Algorithms (Computational Molecular Biology), the MIT Press, 1994.