Final project: submission

Author

Jelmer Poelstra

Published

April 5, 2024



The submission of your final project is due on Monday, April 29th. [25 points]


How to submit

As you did for the final project proposal, open a new GitHub issue for your repository and tag @jelmerp in the text body of the Issue.


What to submit

Your repository should now contain:

  • A finished set of scripts.

  • Final documentation in one or more README files that clearly describes:

    • What the project does as a whole.
    • What each script does.
    • Where to access the data at OSC, assuming that the data is not in your repository.
    • How the project’s scripts can be rerun (see below).
  • A single runner script1 that aims to rerun the full workflow.

  • A file (e.g. submission_notes.md) or a section in your main README file that provides some additional information for me to grade your project appropriately. Some examples of things you may want to include:

    • Additional instructions I will need to try and rerun your project, which I will be doing!
    • Alerting me to some files files in the repository that I should ignore.

Questions and advice

Don’t hesitate to contact me for questions about topics like:

  • Specific expectations for the final project that are unclear to you.
  • Whether you are on the right track in making some adjustments that I asked for after your progress report.
  • Advice on how to code or organize aspects of your project.

I’m happy to answer questions by e-mail or in a Zoom meeting!


Late submissions

Late submission may be accommodated depending on circumstances, but you will need to contact me before 12 pm on April 29th, and we can take it from there.

For late submissions with no advance notice, 4 points will be subtracted for each day the submission is late.


Graded aspects

Below is a list of graded aspects and what to aim for if you want a perfect score. I’m providing a lot of detail here, so there are no surprises. In summary, you should aim to have a reproducible, well-organized and well-documented workflow — workflow size/complexity on the other hand, is not that important2.


Category Max. score Max. score if your project (examples given):
Project organization 4
  • Clear and appropriate directory structure.
  • Informative and appropriate directory and file names.
  • Does not mix data, scripts, and results in individual directories.
Project background and documentation 4
  • Clear description of its background and goals.
  • Clear description of how different scripts are being used to achieve these goals.
  • Where appropriate, indicates what is still a work-in-progress (and optionally future directions).
Good practices in scripts 4
  • Bash scripts with proper set settings, and similar good practices as taught in the course.
  • Individual scripts are not overly long and don’t do multiple unrelated things (except the runner script).
  • Has comments to document what is being done within scripts.
  • Scripts take arguments where appropriate, with limited or no “hard-coding” of potentially variable things like input/output dirs and file names. Anything along these lines that is “hard-coded” in the script is clearly set/stated at the top of scripts.
  • Uses few or no absolute paths in scripts.
Workflow reproducibility 4
  • Has a runner script that includes all steps in your analysis and that can be run by anybody with access to your repository and the raw data files.
  • Has information for me (or any other reader of the project!) about where at OSC to find the raw data files, and about other details needed to try to rerun the analyses.
  • Appropriate software management with OSC modules and/or Conda environments (and/or containers), with versions for all software that you use clearly stated somewhere.
Slurm jobs at OSC 3
  • Has one or more scripts that are run as OSC jobs at Slurm.
  • Uses appropriate Slurm options.
Project/coding quality 3
  • Uses tools and commands that are at least by and large appropriate to accomplish its goals — and where this is not the case, there should be an explanation that the current setup is e.g. suboptimal / for testing / because you ran out of time. I will not dig into small details and parameter settings.
Version control 3
  • Has Git commit messages that are informative.
  • Has at least reasonably appropriate commits, e.g. individual Git commits don’t consist of multiple completely unrelated edits.
  • Has a single .gitignore file that ignores files like large raw data files, and in most cases, results files.

Good luck!!


Back to top

Footnotes

  1. Or Nextflow pipeline↩︎

  2. See also the General Info page for the final project for some more general background.↩︎