Skip to the content.

DOI

Single-cell RNA sequencing (scRNA-seq) Differential Expression Analysis & Visualization Snakemake Workflow

A Snakemake workflow for performing differential expression analyses (DEA) of processed (multimodal) scRNA-seq data powered by the R package Seurat’s functions FindMarkers and FindAllMarkers.

This workflow adheres to the module specifications of MR.PARETO, an effort to augment research by modularizing (biomedical) data science. For more details, instructions and modules check out the project’s repository. Please consider starring and sharing modules that are useful to you, this helps me in prioritizing my efforts!

If you use this workflow in a publication, please don’t forget to give credits to the authors by citing it using this DOI 10.5281/zenodo.10689139.

Workflow Rulegraph

Table of contents

Authors

Software

This project wouldn’t be possible without the following software and their dependencies:

Software Reference (DOI)
data.table https://r-datatable.com
EnhancedVolcano https://doi.org/10.18129/B9.bioc.EnhancedVolcano
future https://doi.org/10.32614/RJ-2021-048
ggplot2 https://ggplot2.tidyverse.org/
pheatmap https://cran.r-project.org/package=pheatmap
Seurat https://doi.org/10.1016/j.cell.2021.04.048
Snakemake https://doi.org/10.12688/f1000research.29032.2

Methods

This is a template for the Methods section of a scientific publication and is intended to serve as a starting point. Only retain paragraphs relevant to your analysis. References [ref] to the respective publications are curated in the software table above. Versions (ver) have to be read out from the respective conda environment specifications (workflow/envs/*.yaml file) or post execution in the result directory (/envs/scrnaseq_processing_seurat/*.yaml). Parameters that have to be adapted depending on the data or workflow configurations are denoted in squared brackets e.g., [X].

The outlined analyses were performed using the R package Seurat (ver) [ref] unless stated otherwise.

Differential Expression Analysis (DEA). DEA was performed on the assay [X] and data slot [X] with Seurat’s [FindMarkers FindAllMarkers] function using the statistical test [X] with the parameters log2(fold change) threshold of [X] and minimal percentage of expression [X]. The results were filtered for relevant features by adjusted p-value of [X], absolute log2(fold change) of [X] and minimum percentage of expression [X].

Visualization. All filtered result statistics, i.e., number of statistically significant results split by positive (up) and negative (down) effect-sizes, were separately visualized with stacked bar plots using ggplot (ver) [ref]. To visually summarize results of the same analysis the filtered log2(fold change) values of features that were found to be at least in one comparison statistically significantly differentially expressed were visualized in a hierarchically clustered heatmap using pheatmap (ver) [ref]. Volcano plots were generated for each analysis using EnhancedVolcano (ver) [ref] with adjusted p-value threshold of [X] and log2(fold change) threshold of [X] as visual cut-offs for the y- and x-axis, respectively.

The analysis and visualizations described here were performed using a publicly available Snakemake [ver] (ref) workflow [10.5281/zenodo.10689139].

Features

The workflow performs the following steps to produce the outlined results (dea_seurat/{analysis}/).

Usage

Here are some tips for the usage of this workflow:

Configuration

Detailed specifications can be found here ./config/README.md

Examples

We selected a scRNA-seq data set consisting of 15 CRC samples from Lee et al (2020) Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nature Genetics. Downloaded from the Weizmann Institute - Curated Cancer Cell Atlas (3CA) - Colorectal Cancer section.

A comparison of the cell type marker expression split by cell types visualized as a dot plot with the DEA results as hierarchically clustered heatmap of the effect sizes.

data source/authors Workflow Output
Cell Type Marker Dot plot Cell Type Marker Dot plot

We provide metadata, annotation and configuration files for this data set in ./test. The processed and prepared Seurat RDS object has to be downloaded from Zenodo by following the instructions below.

  # download Zenodo records using zenodo_get

  # install zenodo_get v1.3.4
  conda install -c conda-forge zenodo_get=1.3.4

  # download the prepare Seurat RDS object
  zenodo_get --record 10688824 --output-dir=test/data/Lee2020NatGenet/

Links

Resources

Publications

The following publications successfully used this module for their analyses.