# Selected Hopkins courses in genomics and bioinformatics, 2016-17

Term |
Title |
Course Number |
Instructor |

Spring 2017 | Introduction to Genomic Research | EN.600.340 | Steven Salzberg |

Spring 2017 | Applied Comparative Genomics | EN.600.649 | Michael Schatz |

Spring 2017 | Fundamentals of Genome Informatics | AS.020.355 | James Taylor |

1st | Perl For Bioinformatics | SPH.140.636 | Fernando Pineda |

Spring 2017 | Frontiers of Sequencing Data Analysis | EN.600.640 | Ben Langmead |

1st | Statistics & Data Analysis Using R | ME.510.707 | Leslie Cope, Elana Fertig, Luigi Marchionni, and Robert Scharpf |

2nd | Genomics | SPH.260.605 | Jonathan Pevsner |

2nd | Analysis of Biological Sequences | SPH.140.638 | Sarah Wheelan |

3rd | Statistics for Laboratory Scientists I | SPH.140.615.01 | Ingo Ruczinski |

4th | Statistics for Genomics | SPH.140.688.01 | Kasper Hansen |

# Course Descriptions

To register for courses, visit JHU's ISIS site.

**EN.600.340 Introduction to Genomic Research (3 credits)**

*NOTE: This course is cross-listed in both Computer Science and Biomedical Engineering.
Students in both majors will receive credit for this course as part of the overall requirements for each major.*

This course will use a project-based approach to introduce undergraduates to research in computational biology and genomics. During the semester, students will take a series of large data sets, all derived from recent research, and learn all the computational steps required to convert raw data into a polished analysis. Data challenges might include the DNA sequences from a bacterial genome project, the RNA sequences from an experiment to measure gene expression, the DNA from a human microbiome sequencing experiment, and others. Topics may vary from year to year. In addition to computational data analysis, students will learn to do critical reading of the scientific literature by reading high-profile research papers that generated groundbreaking or controversial results.

Prerequisites: knowledge of the Unix operating system and programming expertise in a language such as Perl or Python.

** AS.020.355 Fundamentals of Genome Informatics**

This course will cover fundamental methods used in the analysis of genomic sequencing data, with a particular focus on recent
developments in comparative and functional genomic assays. In particular, we will cover approaches for:

1) genomic sequencing and assembly, including resequencing and "personal" genomes;

2) comparing genomes and modeling genome evolution;

3) identifying functional elements using both "functional genomics" and computational models.
While the course will focus on particular problems in genomics, we will emphasize core algorithmic concepts that generalize to the analysis of other types of biological data.

** EN.600.649 Applied Comparative Genomics**

The goal of this course is to study the leading computational and quantitative approaches for comparing and analyzing genomes starting from raw sequencing data. The course will focus on human genomics and human medical applications, but the techniques will be broadly applicable across the tree of life. The topics will include genome assembly & comparative genomics, variant identification & analysis, gene expression & regulation, personal genome analysis, and cancer genomics. The grading will be based on assignments, a midterm & final exam, class presentations, and a significant class project. There are no formal prereqs, although the class project will require intermediate to advanced programming/scripting to develop a new analysis system and/or apply an existing one to novel datasets.

**SPH.140.636 Perl for Bioinformatics**

Uses the PERL programming language to introduce skills and concepts needed to process and interpret data from high-throughput technologies in the biological sciences. Key concepts are introduced and reinforced through lectures with live computer demonstrations, weekly readings, and programming exercises. Excercises and examples draw heavily from biological sequency analysis as welll as real-world problems in proteomics and genetics. Guest lecturers present case studies of PERL and UNIX usage in scientific investigations. Students are introduced to bioinformatics software-development resources available online and to necessary computer science fundamentals.

Student evaluation based on homework and programming project.

Course learning objectives: Upon successful completion of this course, students will have:

1) a working knowledge of the Perl programming language (including the ability to (1) read and write perl scripts, and (2) download and use perl bioinformatics libraries, e.g. bioperl);

2) an understanding of programming techniques and styles, e.g. top-down vs bottom-up programming, debugging and object oriented programming;

3) an understanding of key fundamental concepts from computer science including notions of data structures, algorithms and computational complexity;

4) the ability to organize the processing of large amounts of data from high-throughput biology experiments;

5) the ability to write automatic scripts that query local and web-based biological databases;

6) the ability to search and use the wealth of software development resources available on the web, e.g. cpan.org, sourceforge.net, and bioperl.org.

**EN.600.640 Frontiers of Sequencing Data Analysis **

Public archives now contain petabytes of valuable but hard-to-analyze DNA sequencing data. Analyzing even small datasets is complicated by sequencing errors, differences between individuals, and the fragmentary nature of the the sequencing reads. In this course, we study recent algorithms and methods that seek to make sense of DNA sequencing datasets from small to very large. Topics covered will vary from year to year, but could include RNA sequencing data analysis, other functional genomics data analysis, metagenomics analysis, data compression, indexing, applications of streaming algorithms and sketch data structures, assembly, etc. There will be homework assignments and a course project. [Applications]

Prereq: 600.439/639 or permission.

**ME.510.707 Statistics & Data Analysis Using R**

This course package is the best way for individuals to learn about the R computing environment and perform statistical tests, make informative graphs and plots, and manage data within a solid statistical framework.

Duration: 10 classes of 3 hours each, meets twice a week.

Wednesday and Fridays, October 24, 26, 21 and November 7, 9, 14, 16, 28, 30. 10AM - 1PM

Explores genomes across the tree of life, using the tools of bioinformatics. Topics include viruses; bacteria and archaea; protozoa (e.g. Plasmodium); plants (with a focus on Arabidopsis and rice); the fungi; the metazoans (Drosophila, C. elegans, the rodents, the primates, and human). Each lecture highlights features of the relevant genome(s), key websites and bioinformatics tools, the phylogenetic context in which to understand the significance of the organism, and genomics-based approaches to human disease. Weekly computer labs introduce students to genomics software available on the internet, including tools for genome annotation, comparison, and analysis.

One midterm exam; one final exam; weekly quizzes.

Course learning objectives: After successfully completing this course, you will be able to do the following: • Define the main features of viral, prokaryotic, and eukaryotic genomes • Define the relevance of various genomes to human disease • Use web-based tools for genome analysis (e.g. annotation and comparison) • Read papers on the sequencing of the human genomes and other genomes, and evaluate the quality and limitations of the analytic approaches.

Class Times:

Monday 10:30 - 11:50

Wednesday 10:30 - 11:50

Lab Times:

Friday 10:30 - 11:50

**SPH.140.638 Analysis of Biological Sequences **

Presents an algorithmic approach to modern biological sequence analysis. Provides an overview of the core algorithms and statistical principles of bioinformatics. Topics include general probability and molecular biology background, sequence alignment (local, global, pairwise and multiple), hidden Markov Models (as powerful tools for sequence analysis), gene finding, and phylogenetic trees. Emphasizes algorithmic perspective although no prior programming experience is required. Covers basic probability and molecular biology in enough detail so that no prior probability or advanced biology classes are required.

Homework 60%, presentation plus written critique 30%, attendance 10%

Course learning objectives: The general goal of the course is to provide students with an in-depth understanding of the algorithms and modeling ideas behind common tools used in genomic sequences research. Students are expected to develop the ability to independently construct models to address specific biological questions, and to independently carry out analyses and interpret the results. No prior programming experience is necessary—students may construct models and algorithms in pseudocode (methodical description, in words, of the series of steps that a program would follow). The specific goals are:

1) Understand concepts in basic molecular biology and probability;

2) Be familiar with classic and modern pairwise alignment algorithms, including BLAST;

3) Understand the statistical significance of alignment scores and the interpretation of alignment algorithm output;

4) Understand the mechanism and the use of dynamic programming;

5) Be familiar with multiple alignment;

6) Understand the different assumptions about evolution made by different models and algorithms;

7) Understand the likelihood approach to phylogenetic reconstruction, and multiple alignment as applied to phylogenetic tree construction;

8) Understand Markov models and hidden Markov models (HMM) in the genomic context, and essential algorithms for analyzing HMMs;

9) Understand HMMs as applied to gene finding. Be familiar with other algorithms in gene finding;

10) Identify from the literature important algorithmic/statistical advances in bioinformatics, and prepare an oral presentation of a recent bioinformatics publication that is important from either a biological or a mathematical perspective.

Class Times:

Tuesday 3:30 - 4:50

Thursday 3:30 - 4:50

**SPH.140.615.01 Statistics for Laboratory Scientists I**

Introduces the basic concepts and methods of statistics with applications in the experimental biological sciences. Demonstrates methods of exploring, organizing, and presenting data, and introduces the fundamentals of probability. Presents the foundations of statistical inference, including the concepts of parameters, estimates, and the use of confidence intervals and hypothesis tests. Topics include experimental design, linear regression, the analysis of two-way tables, and sample size and power calculations. Introduces and employs the freely available statistical software, R, to explore and analyze data.

Methods of Assessment: Homework assignments, one or two in-class exams.

Course learning objectives: Upon successful completion of this course, students will be able to:

1) create appropriate statistical graphics;

2) identify flaws in experimental designs and observational studies, and form appropriate simple experimental designs;

3) explain confounding and identify potential confounding factors in an observational study;

4) solve simple probability problems;

5) calculate and interpret confidence intervals for the difference between two populations' means and for a population proportion;

6) conduct simple tests of statistical hypotheses and calculate and interpret P-values from such tests;

7) calculate power and minimal sample size for simple experiments; 8) use the statistical software, R, to display and analyze data.

Class Times:

Mon, Wed, Fri 10:30 - 11:20

Lab Times:

Wednesday 1:30 - 2:20pm (01)

Wednesday 2:30 - 3:20pm (02)

More info here.

**SPH 140.688.01 Statistics for Genomics **

Introduces statistical genomics with an emphasis on next generation sequencing and microarrays. Covers the key capabilities of the Bioconductor project (a widely used open source software project for the analysis of high-throughput experiments in genomics and molecular biology and rooted in the open source statistical computing environment R). Also introduces statistical concepts and tools necessary to interpret and critically evaluate the bioinformatics and computational biology literature. Includes an overview of preprocessing and normalization, batch effects, statistical inference, multiple comparisons. Intended for students with a background in statistics or biology, but not necessarily both. Assumes some familarity with the R statistical language (a student without any experience in this language can still take the class but will need to set aside additional time to learn R)..

Student evaluation will be based on data analysis homework assignments and a final project. Students who want to learn the concepts without programming may take the class pass/fail and perform a literature review for a final project.

Course learning objectives: Upon successful completion of this course, students will be able to:

1) Describe the basics of how various high-throughput assays works, including microarrays and next generation sequencing;

2) Critique existing methodology for the analysis of high-throughput biological data;

3) Write R code to import and analyze microarray and next generation sequencing data.

Class Times:

Tue, Thu 1:30 - 2:50pm

More info here.