Skip to the content.

🚀🧬 MrBiomics: Modules & Recipes augment Bioinformatics for Multi-Omics Analyses

"For many outcomes, roughly 80% of consequences come from 20% of causes (the "vital few")." - The Pareto Principle by Vilfredo Pareto

Get 80% of all standard (biomedical) data science analyses done semi-automated with 20% of the effort, by leveraging Snakemake's module functionality to use and combine pre-existing workflows into arbitrarily complex analyses.

[!IMPORTANT]
If you use MrBiomics, please don't forget to give credit to the authors by citing this original repository and the respective Modules and Recipes.

⏳ TL;DR - More Time for Science!

"Programming is about trying to make the future less painful. It’s about making things easier for our teammates." from The Pragmatic Programmer by Andy Hunt & Dave Thomas

Illustration of MrBiomics Modules, Recipes and Projects Illustration of MrBiomics Modules, Recipes and Projects

[!NOTE]
Altogether this enables complex, portable, transparent, reproducible, and documented analyses of multi-mics data at scale.

🧠 Functional Knowledge Management

"The best documentation is automation." - Wise Person on the Internet

Functional Knowledge Management (FKM) is our knowledge-management approach in which validated best practices are captured as executable software functions, modules, or recipes.

🧩 Modules

"Is it functional, multifunctional, durable, well-fitted, simple, easy to maintain, and thoroughly tested? Does it provide added value, and doesn't cause unnecessary harm? Can it be simpler? Is it an innovation?" - Patagonia Design Principles

Modules are Snakemake workflows, consisting of Rules for multi-step analyses, that are independent, single-purpose, and sufficiently abstracted to be compatible with most up- and downstream analyses. A {module} can be general-purpose (e.g., Unsupervised Analysis) or modality-specific (e.g., ATAC-seq processing). Currently, the following eleven modules are available, roughly ordered by their applicability from general to specific:

Module Type (Data Modality) DOI Version Stars
Unsupervised Analysis General Purpose
(tabular data)
DOI GitHub Release GitHub Repo stars
Fetch NGS Data and Metadata using iSeq Bioinformatics
(NGS data)
DOI GitHub Release GitHub Repo stars
Split, Filter, Normalize and Integrate Sequencing Data Bioinformatics
(NGS counts)
DOI GitHub Release GitHub Repo stars
Differential Analysis with limma Bioinformatics
(NGS data)
DOI GitHub Release GitHub Repo stars
Enrichment Analysis Bioinformatics
(genes/genomic regions)
DOI GitHub Release GitHub Repo stars
Genome Track Visualization Bioinformatics
(aligned BAM files)
DOI GitHub Release GitHub Repo stars
ATAC-seq Processing, Quantification & Annotation Bioinformatics
(ATAC-seq)
DOI GitHub Release GitHub Repo stars
RNA-seq Processing, Quantification & Annotation Bioinformatics
(RNA-seq)
DOI GitHub Release GitHub Repo stars
scRNA-seq Processing using Seurat Bioinformatics
(scRNA-seq)
DOI GitHub Release GitHub Repo stars
Differential Analysis using Seurat Bioinformatics
(scRNA-seq)
DOI GitHub Release GitHub Repo stars
Perturbation Analysis using Mixscape from Seurat Bioinformatics
(scCRISPR-seq)
DOI GitHub Release GitHub Repo stars

[!NOTE]
⭐️ Star and share modules you find valuable 📤 — help others discover them, and guide our future work!

[!TIP] For detailed instructions on the installation, configuration, and execution of modules, you can check out the wiki. Generic instructions are also shown in the modules' respective Snakmake workflow catalog entry.

📋 Projects using multiple Modules

“Absorb what is useful. Discard what is not. Add what is uniquely your own.” - Bruce Lee

You can (re-)use and combine pre-existing workflows within your projects by loading them as Modules since Snakemake 6. The combination of multiple modules into projects that analyze multiple datasets represents the overarching vision and power of MrBiomics.

[!NOTE] When applied to multiple datasets within a project, each dataset should have its own result directory within the project directory.

Three components are required to use a module within your Snakemake workflow (i.e., a project).

[!TIP] A full tutorial is available on the wiki.

📜 Recipes

"Civilization advances by extending the number of important operations which we can perform without thinking of them." - Alfred North Whitehead, author of Principia Mathematica

Recipes are combinations of existing modules into end-to-end best practice analyses. They can be used as templates for standard analyses by leveraging existing modules, thereby enabling fast iterations and progression into the unknown. Every recipe is described and presented using a wiki page by application to a publicly available dataset.

Recipe Description # Modules used
RNA-seq Analysis From raw BAM files to enrichemnts of differentially expressed genes. 7
ATAC-seq Analysis From raw BAM files to enrichemnts of differentially accessible regions. 7
Integrative ATAC-seq & RNA-seq Analysis From count matrices to epigenetic potential and relative transcriptional abundance. 8
scRNA-seq Analysis From count matrix to enrichemnts of differentially expressed genes. 5(-6)
scCRISPR-seq Analysis From count matrix to knockout phenotype enrichemnts. 6(-7)

[!TIP] For detailed instructions make sure to check out our How to use Recipes guide on the wiki.

[!NOTE]
⭐️ Star this repository and share recipes you find valuable 📤 — help others find them, and guide our future work!

📚 Resources

⭐ Star History of Modules

Star History Chart