Five written or computer-based problem sets will be assigned. These are designed to promote deeper understanding of the principles and algorithms discussed in class and to provide hands-on experience with bioinformatics tools.

NOTE: These assignments reference Athena, MIT's UNIX-based computing environment. OCW does not provide access to this environment.

1 Sequence Search, Global Alignment, BLAST Statistics Problem Set 1 (PDF) Solutions to Problem Set 1 (PDF) 19
2 BWT, Library Complexity, RNA-seq, Genome Assembly, Motifs, Multiple Hypothesis Testing

Problem Set 2 (PDF)

Problem Set 2 Files (ZIP) (This ZIP file contains: 4 .py files, 1 .index file, and .3 txt files)

Solutions to Problem Set 2 (PDF - 1.4MB) 31
3 Gibbs Sampler, RNA Secondary Structure, Protein Structure with PyRosetta, Connections

Problem Set 3 (PDF)

Problem Set 3 Files (ZIP) (This ZIP file contains: 3 .py files, 3 .fa files, and .1 txt file)

Solutions to Problem Set 3 (PDF - 1.5MB) 25
4 Bayesian Networks, Refining Protein Structures in PyRosetta, Mutual Information of Protein Residues

Problem Set 4 (PDF)

Problem Set 4 Files (ZIP) (This ZIP file contains: 3 .py files and 2 .fasta files)

Solutions to Problem Set 4 (PDF) 21
5 Network Statistics, Chromatin Structure, Heritability, Association Testing

Problem Set 5 (PDF - 1.1MB)

Problem Set 5 Files (ZIP)

Solutions to Problem Set 5 (PDF) 24


The total number of points available on the problem sets is 120; your score for the homework portion of the course is based on a maximum of 100 points. This means that you can miss one problem set (or a portion of one problem set) and still do fairly well on this component if you have done well on the other problem sets. For example, a student who obtained perfect marks on 4 of 5 problem sets, each valued at 24 points would get 96 points for the homework component of the course, almost as good as a student who completed all 5 problem sets, earning 90% of points on each, since 0.9 x 120 = 108, which would earn the maximum score of 100. Because of this, no make-up assignments will be offered. Of course, it is still to your advantage to do all five problem sets, as this will help you to learn the material in more depth, help prepare you for exams, etc. Please note that the point values of individual problem sets may vary somewhat from the 24 point average value, depending on their length and level of difficulty.

Late Assignments

Assignments submitted within 24 hours of the time they were due will be eligible for 50% credit. If necessary, you may turn in your written portion and your programming portion separately. For example, if you turn in your written portion on time and your programming by the late due date, your written work will be eligible for full points but the programming will be eligible for only 50% of points. You may not further sub-divide your submissions. Because answer keys will be posted, no homework will be accepted after the extended deadline.

Collaboration on Problem Sets

The goal of the problem sets is to reinforce the material and sometimes to explore a topic in greater depth. You may talk with other students about the problems and work on them together. However, you should write up your own solutions. Copying someone else's solutions will not improve your understanding of the material and is not acceptable. Duplicate or nearly identical problem sets from different students will receive a score of zero. This has happened. We notice. Don’t let it happen to you!

You must write your own code on problem sets. You may discuss the programming problems with other students. The following two simple rules should make it clear what is not permitted:

  1. Do not copy or reuse code from any source (except the sample code provided).
  2. Do not share your code with anyone else in the class.