About
Introduction to Bioinformatics
Chemistry 160A / 260A, Computer Science CM121, CM221
Fall quarter 2009
Lecture: Franz 2258A, Tues, Thurs 10:00 am - 12:00
Discussion: MS 5200 Tues. 4:00 - 6:00 pm.
- Instructor: Christopher Lee, Depts. of Chemistry, Computer Science
-
- Contact: leec@chem.ucla.edu; TEL 825-7374
- Office Hours:
TA: Nils Homer, nhomer@cs.ucla.edu
Course URL: http://c260a.bioinformatics.ucla.edu/
Revised Sept. 23, 2009
Schedule
| Date | Lecture | Due | Reading |
|---|---|---|---|
| Sep 24, 2009 | Course Logistics | JP1-3 | |
| Sep 29, 2009 | Bayes Law | (EG1-1.11); EG1.12; L1; L2 | |
| Oct 1, 2009 | A recipe for inference | L3 | |
| Oct 6, 2009 | Law of Large Numbers | HW1 | L6-7 |
| Oct 8, 2009 | uncertainty; nuisance variables | EG2.1-2.7; EG3.1-3.7; L4 | |
| Oct 13, 2009 | Application: SNP discovery | HW2 | |
| Oct 15, 2009 | ROC curves; hypothesis testing | JP12.2; L5 | |
| Oct 20, 2009 | HMMs, Viterbi algorithm | Project 1 | EG4.4-4.9; JP11.1-11.3; EG11.1; 11.2.2 |
| Oct 22, 2009 | sequence modeling: e.g. gene prediction | ||
| Oct 27, 2009 | forward-backward algorithm | HW3 | L1.6.4-5; EG11.2.1 |
| Oct 29, 2009 | profile HMM alignment | ||
| Nov 3, 2009 | Baum-Welch training | HW4 | JP11.4; EG11.2.3 |
| Nov 5, 2009 | sequence alignment | JP6.1-6.6 | |
| Nov 10, 2009 | MIDTERM | ||
| Nov 12, 2009 | types of alignment | JP6.8-6.9; 6.11-6.15 | |
| Nov 17, 2009 | affine gap penalties; HMM alignment model | Project 2 | JP11.5; EG11.3 |
| Nov 19, 2009 | phylogeny intro | EG13; EG14.1-14.6 | |
| Nov 24, 2009 | evolutionary mutation models | HW5 | |
| Nov 26, 2009 | THANKSGIVING | ||
| Dec 1, 2009 | phylogenetic inference | HW6 | JP10.5-10.12; EG14.7-9 |
| Dec 3, 2009 | review session; course evals | ||
| Dec 10, 2009 | FINAL 3 - 6 pm |
Summary
This course will cover bioinformatics concepts and methodologies. It seeks to emphasize the concepts behind the rapid development of the field, both to give conceptual understanding of these very new areas, and to give students a foundation for how to do innovative work in these fields. The course aims to teach the conceptual foundations for the student to be able to invent new kinds of bioinformatics. It seeks to teach this material through real problems and examples of solutions.
The course emphasizes statistical inference and algorithmic complexity as the two foundations of bioinformatics. Bioinformatics can be described broadly as the study of the inherent structure of biological information. In practice this means that bioinformatics problems can be considered to reduce to the problem of discovering whatever patterns are present in the data. This has two components: algorithms for finding a given kind of pattern (and the inherent computational difficulty of finding that pattern), and ways to measure the strength of the evidence that a given pattern is statistically significant (i.e. not just “random noise”). We will consider both components in detail, and their inter-relationships on various problems. The course will cover bioinformatics algorithms, their foundations in genomics data, and their use for analysis and interpretation of genomics data. We’ll examine sequence analysis and comparative genomics algorithms to get an understanding of the fundamental computational issues for biological data search and analysis.
Textbooks
Required (Available at the UCLA Bookstore): Note, the following course in the sequence, C160B/260B, also uses these same textbooks.
- Jones & Pevzner, An Introduction to Bioinformatics Algorithms,
- Ewens & Grant, Statistical Methods in Bioinformatics.
Prerequisites
- Statistics 100a (or 110A or Math 170A or Biostatistics 100A or 110A) is required for this course. This course will assume a knowledge of statistics at this level.
- Computer Science 180 (or PIC 60) is also required for this course. An understanding of data structures, algorithms and computational complexity will be assumed for this course. Weekly programming projects will be an important part of the coursework. Students must be competent in a suitable programming language (e.g. Python, C/C++, Java, etc.).
For students who lack these prerequisites, we recommend the newly created course MCDB 172 / 272, “Genomics and Bioinformatics”, offered by Prof. Matteo Pellegrini, also in Fall quarter. This course is tailored more for life science students, and requires no statistics or computer science prerequisites.
If you have any questions about whether you can / should take the course please contact me.
Grading
- Graduate students: 10% homework, 25% projects, 20% midterm exam, 30% final exam, 15% term paper.
- Undergraduate students: 15% homework, 25% projects, 25% midterm exam, 35% final exam.
Homework sets will be due weekly (in class on Thursdays), and will involve a combination of mathematical problems and computational projects implementing algorithms studied in class. Students should have access to a suitable programming language (e.g. Python, C/C++, Java) for implementing these homework projects. Such tools can be downloaded for free on the web and installed on any Mac OS X, Linux or Windows computer. Mac OS X, in particular, comes with Python pre-installed and a suite of development tools (XCode) available as a free download.
Course Syllabus
Inferential Foundations: Genomics presents us with a massive data interpretation problem. To infer the hidden meanings of these data requires measuring evidence for competing interpretations from the diverse, complex and unreliable observation data. We introduce Bayesian statistics and stochastic processes.
Sequence Analysis and Comparison Algorithms: questions about the interesting biological features in genomics datasets almost always map to classic computer science problems. To answer a question, you have to solve the computer science problem. This section will introduce the matchup between basic biological questions and string match, dynamic programming and Hidden Markov Models.
Evolutionary Models and Phylogenetic Inference: a rigorous approach to genetic analysis and modeling of evolution requires treating them as inference problems. We analyze methods for solving these problems computationally.