scCRISPR-seq Perturbation Analysis Workflow using Seurat’s Mixscape
A Snakemake 8 workflow for performing perturbation analyses of pooled (multimodal) CRISPR screens with scRNA-seq read-out (scCRISPR-seq, CROP-seq, Perturb-seq) powered by the R package Seurat’s method Mixscape.
[!NOTE]
This workflow adheres to the module specifications of MrBiomics, an effort to augment research by modularizing (biomedical) data science. For more details, instructions, and modules check out the project’s repository.⭐️ Star and share modules you find valuable 📤 - help others discover them, and guide our future work!
[!IMPORTANT]
If you use this workflow in a publication, please don’t forget to give credit to the authors by citing it using this DOI 10.5281/zenodo.8424761.
Authors
💿 Software
This project wouldn’t be possible without the following software and it’s dependencies:
Software | Reference (DOI) |
---|---|
data.table | https://r-datatable.com |
ggplot2 | https://ggplot2.tidyverse.org/ |
Mixscape | https://doi.org/10.1038/s41588-021-00778-2 |
mixtools | https://CRAN.R-project.org/package=mixtools |
patchwork | https://CRAN.R-project.org/package=patchwork |
Seurat | https://doi.org/10.1016/j.cell.2021.04.048 |
Snakemake | https://doi.org/10.12688/f1000research.29032.2 |
🔬 Methods
This is a template for the Methods section of a scientific publication and is intended to serve as a starting point. Only retain paragraphs relevant to your analysis. References [ref] to the respective publications are curated in the software table above. Versions (ver) have to be read out from the respective conda environment specifications (workflow/envs/*.yaml file
) or post-execution in the result directory (mixscape_seurat/envs/*.yaml
). Parameters that have to be adapted depending on the data or workflow configurations are denoted in squared brackets e.g., [X].
The outlined analyses were performed using the R package Seurat (ver) [ref] unless stated otherwise.
Mixscape. We applied the Mixscape workflow [ref], implemented in Seurat, on each [sample] separately as well as all [samples] simultaneously to identify perturbed cells compared to non-targeting (NT) guide RNA (gRNA) assigned cells. Briefly, cells putatively assigned to a gRNA and respective knockout (KO) target gene in conjunction with NT cells were used to calculate cell-wise perturbation signatures by using Seurat::CalcPerturbSig to subtract the average expression profile of the [n_neighbors] closest NT cells in [ndims]-dimensional PCA space. Using Seurat::RunMixscape, with a log2(fold change) threshold of [lfc_th] and a minimum of [min_de_genes] differentially expressed genes, cells were classified as perturbed or non-perturbed using posterior probabilities of an expectation-maximization (EM) algorithm for mixtures of univariate normals, assuming each putatively annotated target gene group is a mixture of two Gaussian distributions (perturbed signal and non-perturbed background).
Visualizations. Statistics of the Mixscape classification of perturbed cells versus cells with no detectable perturbation on a target gene and gRNA basis using barplots. Perturbation scores of cells split by their Mixscape classification as density plots. Posterior probability values of non-perturbed and perturbed cells as violin plots using the Seurat function VlnPlot. Perturbation scores and posterior probabilities were additionally plotted split by replicates [split_by_col] and experiment conditions [split_by_col]. For the visualization of protein surface expression measured by Antibody Capture technologies the Seurat function VlnPlot for violin plots split by perturbation classification of cells was used.
Linear discriminant analysis (LDA). LDA was applied on the perturbation signatures of all perturbed and NT cells using Seurat::MixscapeLDA with number of principal components [npcs] per KO class to find the most discriminative subspace, given the KO/NT classes, to project the data into and visualized in two dimensions using UMAP with Seurat::RunUMAP.
The analysis and visualizations described here were performed using a publicly available Snakemake [ver] (ref) workflow 10.5281/zenodo.8424761.
🚀 Features
The workflow performs all steps of the Mixscape Vignette on all samples in the annotation file according to the parametrization in the config file.
- Calculation of local perturbation signatures (
{analysis}/
)- all and filtered (i.e., only pertubed cells) perturbation signatures (
{ALL|FILTERED}_PRTB_data.csv
).
- all and filtered (i.e., only pertubed cells) perturbation signatures (
- Mixscape classification of perturbed cells versus cells with no detectable perturbation (
{analysis}/{ALL|FILTERED}_*
)- Mixscape classification statistics (
{analysis}/mixscape_stats.csv
).
- Mixscape classification statistics (
- Visualization of Mixscape results (
{analysis}/plots/
)- Statistics of the Mixscape classification on a target gene and guide RNA basis as bar plots (`stats/{KO}.png’).
- Perturbation scores of cells split by their mixscape classification as density plots (`PerturbScore/{KO}_{split}.png’).
- Posterior probability values in non perturbed and perturbed cells as violin plots (`PosteriorProbability/{KO}_{split}.png’).
- (optional) if Antibody Capture was used: Surface protein expression measurements split by perturbation classification of cells as violin plots (`{Antibody_Capture_flag}_expression/{protein}.png’).
- Analysis of perturbation responses with Linear Discriminant Analysis (LDA)
- LDA components (
LDA_data.csv
) - 2D visualization using UMAP as scatter plot (
{analysis}/plots/LDA_UMAP
).
- LDA components (
🛠️ Usage
Read the Mixscape Vignette.
⚙️ Configuration
Detailed specifications can be found here ./config/README.md
📖 Example
— COMING SOON —
🔗 Links
📚 Resources
- Recommended compatible MrBiomics modules for
- upstream processing:
- scRNA-seq Data Processing & Visualization for processing (multimodal) single-cell transcriptome data.input.
- downstream analyses:
- Unsupervised Analysis to understand and visualize similarities and variations between cells/samples, including dimensionality reduction and cluster analysis. Useful for all tabular data including single-cell and bulk sequencing data.
- Differential Analysis using Seurat to identify and visualize statistically significantly different features (e.g., genes or proteins) between groups.
- Enrichment Analysis for biomedical interpretation of (differential) analysis results using prior knowledge.
- upstream processing:
- Mixscape publication: Papalexi et al. (2021) Nature Genetics - “Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens.”.
📑 Publications
The following publications successfully used this module for their analyses.