Building HPC-Compliant Snakemake Data Analysis Workflows

Europe/Berlin
N33 - ZDV
Christian Meesters
Description

Overview

This tutorial equips participants with the essential skills to design and implement High-Performance Computing (HPC) compliant data analysis workflows using the Snakemake workflow management system. Through hands-on exercises and practical demonstrations, attendees will learn how to harness the power of this workflow manager to utilize HPC resources effectively and ensure reproducibility in their data analysis workflows. The Snakemake workflow system is widely used in bioinformatics, experimental physics and other data analysis fields.

Agenda

  • Workflow Design and Implementation: Step-by-step instructions for designing and implementing HPC-compliant workflows using Snakemake, including best practices for parallelization and resource management.

  • Optimizing Performance: Techniques for optimizing workflow performance on HPC clusters, including resource allocation and avoiding I/O contention.

  • Ensuring Reproducibility: Strategies for ensuring reproducibility and scalability in data analysis workflows.

  • Case Studies and Practical Examples: Real-world case study and practical examples demonstrating the application of HPC-compliant Snakemake workflows in various data analysis scenarios.

  • Introduction to Snakemake and HPC: Overview of Snakemake workflow management system and the importance of HPC compliance in data analysis workflows.

  • Setting Up HPC Environment: Guidance on configuring Snakemake for HPC environments, including considerations for batch systems and job scheduling.

  • Publishing and registering workflows with the Snakemake Workflow Catalogue for better visibilty and citation.

  • Q&A and Troubleshooting: Opportunity for participants to ask questions, seek clarification, and troubleshoot challenges encountered during the tutorial.

 

The course material (slides) will be made available to the class participants.

Learning Outcomes

By the end of this tutorial, participants will:

  • Understand the principles of HPC compliance in data analysis workflows.

  • Be proficient in configuring Snakemake for HPC environments and leveraging HPC resources effectively.

  • Be able to design, implement, and optimize HPC-compliant data analysis workflows using Snakemake.

  • Possess the skills to optimize workflow performance, ensure reproducibility, and troubleshoot common challenges in HPC environments.

  • Gain insights from real-world case studies and practical examples to apply HPC-compliant Snakemake workflows in their own research projects effectively.

Prerequisites

  • Ability in navigating the shell (bash) for basic file manipulation and command execution.

  • Ability to log in to remote servers via SSH (Secure Shell) for remote access

  • Familiarity with fundamental concepts of HPC, including job scheduling, parallel computing, and resource allocation, is beneficial.

  • Basic knowledge of Python scripting language is beneficial, including variables, data structures, control flow statements, and functions.

Christian Meesters
Registration
Participants
Participants
  • Amritanshu Verma
  • Anila Ghazanfar
  • Anna Katharina Renner
  • Anne Busch
  • Carl Lang
  • Christian Siadjeu
  • Eric Schumbera
  • Helge Vatheuer
  • Jonas Tünnermann
  • Kshitija Naktode
  • Maike Grimm
  • martin machajewski
  • Maximilian Sprang
  • Nils Meisenheimer
  • Nina Luhmann
  • Piyush More
  • ROXANNE FRASER
  • Ruth Moraa
  • Shweta Singh
  • Stefano Ruiz
  • Varenikova Aleksandra
  • VASILEIOS XENIDIS
  • Yangzi Wang