Skip to content

Latest commit

 

History

History
17 lines (9 loc) · 2.14 KB

README.md

File metadata and controls

17 lines (9 loc) · 2.14 KB

Workflows Directory Overview

Welcome to the Workflows directory of the Tumor Evolution and Genomics Section Wiki. This directory is a resource for understanding the tools and processes commonly used in our research. The guides are aimed at providing a foundational understanding of each tool's purpose and typical use cases.

Contents

  • Biowulf: Biowulf is a high-performance computing system used for computational biology that we run all our work on. This describes some specialized workflows that we use.

  • Project Organization: Project organization is critical for maintainable, shareable and reproducible project. This details one way to organize a project directory. Even if you don't use this exact format, you should understand it so you can ensure your project integrates the important concepts.

  • Conda: Conda is an open-source package management system and environment management system. It is used for installing, running, and updating packages and their dependencies. It also allows for the creation of isolated environments to manage different projects' dependencies separately.

  • Jupyter: Jupyter notebooks are an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. They are widely used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, and machine learning.

  • Tmux: Tmux is a terminal multiplexer that allows multiple terminal sessions to be accessed simultaneously in a single window. It is used for keeping sessions running in the background and for organizing work into manageable sessions. This describes some of the basic commands and and specialized biowulf workflows that we use.

  • Nextflow: Nextflow is a workflow management system that enables scalable and reproducible scientific workflows using software containers. It simplifies the deployment of complex data analysis pipelines across different computing infrastructures. We use it to ensure reproducibility and scalability of our workflows.