General information about your final project

Author

Jelmer Poelstra

Published

March 7, 2024



Devising a project

The goal for your final project is to apply some of the things you have learned during this course, and allow you to get you more practice. While some aspects are required in order to get a good grade (see Graded aspects below), you have a fair amount of freedom. I recommend that you take advantage of that to make your final project as useful as possible for your own research and/or personal development.

Graded aspects for the project focus on documentation, reproducibility, and automation. Accordingly, there are no real requirements for the level of complexity or sophistication, the number of scripts, the real-world usefulness, and so on. I would actually recommend that you take care not to be too ambitious in this regard: start small and then expand if you are able to.

Some examples of possible types of projects:

  • Work with your own data or a publicly available dataset that is similar to what you are expecting to work with1.
  • You may decide it would be most useful for you to focus on coding and creating a reproducible workflow, and that this is easier with a trivial and maybe not-so-relevant dataset, so you don’t get bogged down in other details.

If you need help with selecting a dataset or a project topic, don’t hesitate to contact me. It would be easiest for me to provide you with a genomic/transcriptomic data set, like RNA-Seq, microbial metabarcoding, or microbial isolate whole-genome sequencing, I but could also help you find or create another type of data set.


Graded aspects

Graded aspects of your project focus on appropriate usage of the tools and principles we covered during the course. To receive a high grade, your project should:

  • Be well-organized: contained in a single parent directory with a clear and sensible structure of sub-directories, descriptive file and directory names, no files floating around with unclear purpose or source, and so on.

  • Be well-documented, with at least one README in Markdown format in the root directory of your project, and preferably with additional READMEs elsewhere as appropriate.

  • Be version-controlled with Git with regular, meaningful commits throughout. You will also need to push your repository to GitHub for the proposal and final submission checkpoints.

  • Contain shell scripts for data processing and/or analysis.

    • These shell script will likely run external, command-line bioinformatics tools.

    • Having some components in other computer languages is fine, in case you know how to do things there that we didn’t learn in the course (for example, doing some plotting in R, or using Python for data processing).

    • Try to minimize manual work such as editing a data file in a text editor, or using Excel to fix column names, since this hinders reproducibility.

  • Run one or more scripts as Slurm batch jobs at OSC.

  • Be easily re-runnable using a “runner script”2. Note that after your final submission, I will actually try to rerun your project as part of the grading process!


Checkpoints

Information about expectations for each of the checkpoints (submissions & oral presentation) for the project will be provided in the content for individual weeks. Below is an overview which will be updated with links to pages with information about each:

What Due Nr. points
Proposal April 8 (Mon) 10
In-class oral presentations April 16 (Tue) & April 18 (Thu) 15
Final submission April 29 (Mon) 25


Back to top

Footnotes

  1. Note that “subsetting” the data may be useful or necessary if you have a large genomic dataset, and I can help you with that.↩︎

  2. Or optionally a Nextflow workflow↩︎