About

Update: September 23rd, 2009

Introduction to Bioinformatics

Chemistry 160A / 260A, Computer Science CM121, CM221

Fall quarter 2009

Lecture: Franz 2258A, Tues, Thurs 10:00 am - 12:00

Discussion: MS 5200 Tues. 4:00 - 6:00 pm.

Instructor: Christopher Lee, Depts. of Chemistry, Computer Science

TA: Nils Homer, nhomer@cs.ucla.edu

Course URL: http://c260a.bioinformatics.ucla.edu/

Revised Sept. 23, 2009

Schedule

Course Schedule




Date Lecture Due Reading
Sep 24, 2009 Course Logistics JP1-3
Sep 29, 2009 Bayes Law (EG1-1.11); EG1.12; L1; L2
Oct 1, 2009 A recipe for inference L3
Oct 6, 2009 Law of Large Numbers HW1 L6-7
Oct 8, 2009 uncertainty; nuisance variables EG2.1-2.7; EG3.1-3.7; L4
Oct 13, 2009 Application: SNP discovery HW2
Oct 15, 2009 ROC curves; hypothesis testing JP12.2; L5
Oct 20, 2009 HMMs, Viterbi algorithm Project 1 EG4.4-4.9; JP11.1-11.3; EG11.1; 11.2.2
Oct 22, 2009 sequence modeling: e.g. gene prediction
Oct 27, 2009 forward-backward algorithm HW3 L1.6.4-5; EG11.2.1
Oct 29, 2009 profile HMM alignment
Nov 3, 2009 Baum-Welch training HW4 JP11.4; EG11.2.3
Nov 5, 2009 sequence alignment JP6.1-6.6
Nov 10, 2009 MIDTERM
Nov 12, 2009 types of alignment JP6.8-6.9; 6.11-6.15
Nov 17, 2009 affine gap penalties; HMM alignment model Project 2 JP11.5; EG11.3
Nov 19, 2009 phylogeny intro EG13; EG14.1-14.6
Nov 24, 2009 evolutionary mutation models HW5
Nov 26, 2009 THANKSGIVING
Dec 1, 2009 phylogenetic inference HW6 JP10.5-10.12; EG14.7-9
Dec 3, 2009 review session; course evals
Dec 10, 2009 FINAL 3 - 6 pm

Summary

This course will cover bioinformatics concepts and methodologies. It seeks to emphasize the concepts behind the rapid development of the field, both to give conceptual understanding of these very new areas, and to give students a foundation for how to do innovative work in these fields. The course aims to teach the conceptual foundations for the student to be able to invent new kinds of bioinformatics. It seeks to teach this material through real problems and examples of solutions.

The course emphasizes statistical inference and algorithmic complexity as the two foundations of bioinformatics. Bioinformatics can be described broadly as the study of the inherent structure of biological information. In practice this means that bioinformatics problems can be considered to reduce to the problem of discovering whatever patterns are present in the data. This has two components: algorithms for finding a given kind of pattern (and the inherent computational difficulty of finding that pattern), and ways to measure the strength of the evidence that a given pattern is statistically significant (i.e. not just “random noise”). We will consider both components in detail, and their inter-relationships on various problems. The course will cover bioinformatics algorithms, their foundations in genomics data, and their use for analysis and interpretation of genomics data. We’ll examine sequence analysis and comparative genomics algorithms to get an understanding of the fundamental computational issues for biological data search and analysis.

Textbooks

Required (Available at the UCLA Bookstore): Note, the following course in the sequence, C160B/260B, also uses these same textbooks.

  • Jones & Pevzner, An Introduction to Bioinformatics Algorithms,
  • Ewens & Grant, Statistical Methods in Bioinformatics.

Prerequisites

  • Statistics 100a (or 110A or Math 170A or Biostatistics 100A or 110A) is required for this course. This course will assume a knowledge of statistics at this level.
  • Computer Science 180 (or PIC 60) is also required for this course. An understanding of data structures, algorithms and computational complexity will be assumed for this course. Weekly programming projects will be an important part of the coursework. Students must be competent in a suitable programming language (e.g. Python, C/C++, Java, etc.).

For students who lack these prerequisites, we recommend the newly created course MCDB 172 / 272, “Genomics and Bioinformatics”, offered by Prof. Matteo Pellegrini, also in Fall quarter. This course is tailored more for life science students, and requires no statistics or computer science prerequisites.

If you have any questions about whether you can / should take the course please contact me.

Grading

  • Graduate students: 10% homework, 25% projects, 20% midterm exam, 30% final exam, 15% term paper.
  • Undergraduate students: 15% homework, 25% projects, 25% midterm exam, 35% final exam.

Homework sets will be due weekly (in class on Thursdays), and will involve a combination of mathematical problems and computational projects implementing algorithms studied in class. Students should have access to a suitable programming language (e.g. Python, C/C++, Java) for implementing these homework projects. Such tools can be downloaded for free on the web and installed on any Mac OS X, Linux or Windows computer. Mac OS X, in particular, comes with Python pre-installed and a suite of development tools (XCode) available as a free download.

Course Syllabus

Inferential Foundations: Genomics presents us with a massive data interpretation problem. To infer the hidden meanings of these data requires measuring evidence for competing interpretations from the diverse, complex and unreliable observation data. We introduce Bayesian statistics and stochastic processes.

Sequence Analysis and Comparison Algorithms: questions about the interesting biological features in genomics datasets almost always map to classic computer science problems. To answer a question, you have to solve the computer science problem. This section will introduce the matchup between basic biological questions and string match, dynamic programming and Hidden Markov Models.

Evolutionary Models and Phylogenetic Inference: a rigorous approach to genetic analysis and modeling of evolution requires treating them as inference problems. We analyze methods for solving these problems computationally.

No comments yet.