CCB Courses
Selected Hopkins Courses in Genomics and Bioinformatics
Course Descriptions
To register for courses, visit JHU's SIS site.
EN.601.447/647 Computational Genomics: Sequences
Your genome is the blueprint for the molecules in your body. It's also a string of letters (A, C, G and T) about 3 billion letters long. How does this string give rise to you? Your heart, your brain, your health? This, broadly speaking, is what genomics research is about. This course will familiarize you with a breadth of topics from the field of computational genomics. The emphasis is on current research problems, real-world genomics data, and efficient software implementations for analyzing data. Topics will include: string matching, sequence alignment and indexing, assembly, and sequence models. Course will involve significant programming projects.
Prerequisite(s): EN.600.120/EN.601.220 AND EN.600.226/EN.601.226
Note: Students may receive credit for only one of EN.600.439, EN.600.639, EN.601.447, EN.601.647.
Return to course list
EN.601.350 Genomic Data Science
This course will use a project-based approach to introduce undergraduates to research in computational biology and genomics. During the semester, students will take a series of large data sets, all derived from recent research, and learn all the computational steps required to convert raw data into a polished analysis. Data challenges might include the DNA sequences from a bacterial genome project, the RNA sequences from an experiment to measure gene expression, the DNA from a human microbiome sequencing experiment, and others. Topics may vary from year to year. In addition to computational data analysis, students will learn to do critical reading of the scientific iterature by reading high-profile research papers that generated groundbreaking or controversial results. [Applications] Recommended Course Background: Knowledge of the Unix operating system and programming expertise in a language such as Perl or Python.
Return to course list
EN.601.446/646 Sketching & Indexing for Sequences
Many of the world's largest and fastest-growing datasets are text, e.g. DNA sequencing data, web pages, logs and social media posts. Such datasets are useful only to the degree we can query, compare and analyze them. Here we discuss two powerful approaches in this area. We will cover sketching, which enables us to summarize very large texts in small structures that allow us to measure the sizes of sets and of their unions and intersections. This in turn allows us to measure similarity and find near neighbors. Second, we will discuss indexing --- succinct and compressed indexes in particular -- which enables us to efficiently search inside very long strings, especially in highly repetitive texts. The course will involve significant programming projects.
Prerequisite(s): EN.601.220 AND EN.601.226
Return to course list
EN.580.743 Advanced Topics in Genome Data Analysis
Genomic data is becoming available in large quantities, but understanding how genetics contributes to human disease and other traits remains a major challenge. Machine learning and statistical approaches allow us to automatically analyze and combine genomic data, build predictive models, and identify genetic elements important to disease and cellular processes. This course will cover current uses of statistical methods and machine learning in diverse genomic applications including new genomic technologies. Students will present and discuss current literature. Topics include personal genomics, integrating diverse genomic data types, new technologies such as single cell sequencing and CRISPR, and other topics guided by student interest. The course will include a project component with the opportunity to explore publicly available genomic data. Recommended Course Background: coursework in data science or machine learning.
Return to course list
EN.601.448/649 Computational Genomics: Data Analysis
Genomic data has the potential to reveal causes of disease, novel drug targets, and relationships among genes and pathways in our cells. However, identifying meaningful patterns from high-dimensional genomic data has required development of new computational tools. This course will cover current approaches in computational analysis of genomic data with a focus on statistical methods and machine learning.Topics will include disease association, prediction tasks, clustering and dimensionality reduction, data integration, and network reconstruction. There will be some programming and a project component.
Prerequisites: EN.601.226 or other programming experience, probability and statistics, linear algebra or calculus.
Note:Students may receive credit for only one of EN.600.438, EN.600.638, EN.601.448, EN.601.648.
Return to course list
EN.601.749 Computational Genomics: Applied Comparative Genomics
The goal of this course is to study the leading computational and quantitative approaches for comparing and analyzing genomes starting from raw sequencing data. The course will focus on human genomics and human medical applications, but the techniques will be broadly applicable across the tree of life. The topics will include genome assembly & comparative genomics, variant identification & analysis, gene expression & regulation, personal genome analysis, and cancer genomics. The grading will be based on assignments, a midterm & final exam, class presentations, and a significant class project. [Applications] Expected course background: familiarity with UNIX scripting and/or programming.
Return to course list
ME.710.744 Genomic Technologies: Tools for Illuminating Biology and Dissecting Disease
[Description]
Return to course list
EN.580.244 Nonlinear Dynamics of Biological Systems
Analysis and simulation of nonlinear behavior in biological systems: bifurcations (cell-fate decision), limit cycles (cell-cycle, neuronal excitations), chaos, and maps. Matlab will be used to simulate these systems and motivate nonlinear analytic tools and stability analysis.
Recommended course background: AS.110.201 Linear Algebra, AS.110.302 Differential Equations, or EN.553.292 Linear Algebra and Differential Equations.
Return to course list
EN.601.452 / AS.020.415 Computational Biomedical Research & Advanced Biomedical Research
This course for advanced undergraduates includes classroom instruction in interdisciplinary research approaches and lab work on an independent research project in the lab of a Bloomberg Distinguished Professor and other distinguished faculty. Lectures will focus on cross-cutting techniques such as data visualization, statistical inference, and scientific computing. In addition to two 50-minute classes per week, students will commit to working approximately 3 hours per week in the lab of one of the professors. The student and professor will work together to schedule the research project. Students will present their work at a symposium at the end of the semester.
Recommended course background: AS.110.201 Linear Algebra, AS.110.302 Differential Equations, or EN.553.292 Linear Algebra and Differential Equations.
Return to course list
EN.580.488/688 Foundations of Computational Biology and Bioinformatics
This course is designed to give students a foundation in the basics of statistical and algorithmic approaches developed in computational biology/bioinformatics over the past 30 years, while emphasizing the need to extend these approaches to emerging problems in the field. Topics covered include probabilistic modeling applied to biological sequence analysis, supervised machine learning, interpretation of genetic variants, cancer genomics bioinformatic workflows and computational immuno-oncology. Attending the lab section "Annotate Your Genome" is required.
Prerequisite(s): EN.601.220
Return to course list
EN 580.458/658 Computing the Transcriptome
This course will introduce computational tools used in the field of transcriptomics to analyze the genes and transcripts expressed in a living cell. Lectures will cover different practical ways to analyze large data sets generated by high-throughput RNA sequencing (RNA-Seq) experiments, including alignment, assembly, and quantification. The students will learn how to use RNA-seq to answer questions such as: what is the complete set of human genes? How do we reconstruct the splice variants that are transcribed in different cell types and conditions? How do we compute which genes are differentially expressed between different RNA-seq datasets?
Prerequisite(s): (1) Familiarity with Python or Perl, (2) the Unix command-line environment, and (3) a basic understanding of programming in R
Return to course list
EN.580.248 Systems Biology of the Cell
Cellular systems biology provides a theoretical and quantitative understanding of the interactions between DNA, RNA, and proteins that create the well-regulated system we call life. This course develops first-principles models for the central dogma of molecular biology: information flow through protein signal transduction pathways, gene regulation by protein-DNA physical interactions, transcription of DNA to RNA, translation of RNA to protein, and feedback regulation that closes the cycle. Topics include complex analysis and contour integrals, spectral transforms, linear models for cell signaling, positive and negative feedback, non-linearities introducted by saturation and cooperativity, information content and combinatorial regulation, and instabilities leading to cell fate specification.
Recommended Course Background: Linear Algebra, Systems and Controls and programming.
Return to course list
EN.580.454 Methods in Nucleic Acid Sequencing
Sequencing technology is a rapidly progressing field that requires experience in both wet (molecular biology) and dry (computational analysis) techniques. This laboratory course will consist of three experimental modules that will provide students with valuable hands-on experience in DNA sequencing and analysis. Students will learn basic sequencing library preparation, perform sequencing experiments and analyze the resulting data. Experiments include human targeted sequencing, metagenomic sequencing and genome assembly.
Prerequisite(s): Students must have completed Lab Safety training prior to registering for this class.
Return to course list
ME.800.806 BCMB Computational Biology Bootcamp
This intensive one week course is meant to immerse student in computation, and to provide them with the foundational tools to be able to apply modern computational techniques and appropriate statistics to their data.
Return to course list