Course Intro

Practical Computing Skills for Omics Data (PLNTPTH 6193)

Jelmer Poelstra

MCIC Wooster, Ohio State University

2024-02-27

Personal introductions

Introductions: Jelmer

  • Bioinformatician at MCIC in Wooster since June 2020
    • The majority of my time is spent providing research assistance,
      working with grad students and postdocs on mostly genomic & transcriptomic data
    • I also teach: some courses, workshops, Code Club (https://osu-codeclub.github.io)

  • Background in animal evolutionary genomics & speciation

  • In my free time, I enjoy bird watching – locally & all across the world

Introductions: You

  • Name

  • Lab and Department

  • Research interests and/or current research topics

  • Something about you that is not work-related

Course overview

The core goals of this course

Learning skills to:

  • Do your research more reproducibly and efficiently

  • Prepare yourself for working with large “omics” datasets

Course background: Reproducibility

  • Two related ideas:

    1. Getting same results with an independent experiment (replicable)

    2. Getting same results given the same data (reproducible)

    Our focus is on #2.

Course background: Reproducibility (cont.)

“The most basic principle for reproducible research is: Do everything via code.”
—Karl Broman


  • Also important:

    • Project organization and documentation (week 2)

    • Sharing data and code (for code: Git & GitHub, week 3)

    • How you code (e.g. week 4 - shell scripts, and 6 - Nextflow)


Another motivator: working reproducibly will benefit future you!

Course background: Efficiency and automation

  • Using code enables you to work more efficiently and automatically —
    particularly useful when having to:

    • Do repetitive tasks

    • Recreate a figure or redo an analysis after adding a sample

    • Redo a project after uncovering a mistake in the first data processing step.

Course background: Omics data

  • Omics data is increasingly important in biology, and most notably includes:
    • Genomics + Microbiomics + Metagenomics
    • Transcriptomics
    • Proteomics
    • Metabolomics

  • While we’ll be using some example omics datasets, this course will not teach you how to analyze omics data in full — our focus is on fundamental computational skills.

I should say that this course is less relevant for proteomics and metabolomics, especially in its slimmed-down half-semester form with no R.

Course topics

The Unix shell & shell scripts (Wk 1 & 4)

Being able to work in the Unix shell is a fundamental skill in computational biology.


  • You’ll spend a lot of time with the Unix shell, starting this week
  • We’ll also write shell scripts, and will use VS Code for this and other purposes.


Bash (shell language)

VS Code

Project organization and Markdown (Wk 2)

Good project organization & documentation is a necessary starting point for reproducible research.


  • You’ll learn best practices for project organization, file naming, etc.

  • To document and report what you are doing, you’ll use Markdown files.


Markdown

Version control with Git and GitHub (Wk 3)

Using version control, you can more effectively keep track of project progress, collaborate, share code, revisit earlier versions, and undo.


  • Git is the version control software we will use,
    and GitHub is the website that hosts Git projects (repositories).

  • You’ll also use Git+GitHub to hand in your final project assignments.


High-performance computing with OSC (Wk 5)

Thanks to supercomputer resources, you can work with very large datasets at speed — running up to 100s of analyses in parallel, and using much larger amounts of memory and storage space than a personal computer has.


  • We will use OSC throughout the course, and you’ll get a brief intro to it today.
  • In week 5, we’ll learn to submit shell scripts as OSC “batch jobs” with Slurm, and use Conda to manage software.


Automated workflow management (Wk 6)

Using a workflow written with a workflow manager, you can run and rerun entire analysis pipeline with a single command, and easily change and rerun parts of it, too.


  • We’ll use the workflow language Nextflow.


Course practicalities

Zoom

  • Be muted by default, but feel free to unmute yourself to ask questions any time.

  • Questions can also be asked in the chat.

  • Having your camera turned on as much as possible is appreciated!

  • “Screen real estate” — large/multiple monitors or multiple devices best.

  • Be ready to share your screen.

Websites & Books

  • I will only use the CarmenCanvas website for Announcements.

  • The GitHub website is the main website for the course, containing all course material:
    • Overviews of each week & readings
    • Slide decks and lecture pages
    • Exercises
    • Final project assignment information

  • Books:

    • Computing Skills for Biologists (“CSB”; Allesina & Wilmes 2019)
    • Bioinformatics Data Skills (“Buffalo”; Buffalo 2015)

Homework

Final project (graded)

Plan and implement a small computational project, with the following checkpoints:

  • I: Proposal (due week 4 – 10 points)

  • II: Draft (due week 6 – 10 points)

  • III: Oral presentations on Zoom (week 7 – 10 points)

  • IV: Final submission (due April 29 – 20 points)


Data sets for the final project

If you have your own data set & analysis ideas, that is ideal. If not, I can provide you with this.

More information about the final project will follow in week 2 or 3.

Ungraded homework

  • Weekly readings — somewhat up to you when to do these

  • Weekly exercises — I recommend doing these on Fridays

  • Miscellaneous small assignments such as surveys and account setup.


Weekly materials & homework

I will try add the materials for each week on the preceding Friday — at the least the week’s overview and readings.

None of this homework had to be handed in.

Weekly recitation on Monday

If there is interest, we can have a weekly Monday meeting in which we go through the exercises for the preceding week.


If you’re interested, indicate your availability here:
https://www.when2meet.com/?23841132-KV8fY

Rest of this week

  • Brief introduction to the Ohio Supercomputer Center (OSC)

  • Unix shell basics

  • Homework:

Questions?





(Back to the site)