Keep calm and dropSeqPipe – bioinformatic tools for analyzing your single cell research
Looking to analyze your single-cell RNA-Seq (scRNA-Seq) data in an easy and reproducible manner but don’t quite know how? You’ve come to the right place.
In this post I will describe the open-source pipeline dropSeqPipe (dSP), which is available free of charge on GitHub.
- What is dropSeqPipe and what is it used for?
- dropSeqPipe and quality control of single-cell data
- DropSeqPipe is an open source pipeline!
- Generating graphs with dSP
What is dropSeqPipe and what is it used for?
dSP is an open source bioinformatic tool that is based on the Drop-seq tools provided by the McCarroll Lab. dSP can be used to analyse scRNA-Seq data from a variety of single-cell platforms. This includes the Nadia platform, the Chromium platform, or any other scRNA-Seq instrument which uses paired-end reads in their sequencing workflow. At its core, the pipeline creates a digital gene expression matrix by pre-processing raw data, requiring Unix-based systems such as Linux or MacOS, similar to 10X Genomics’s Cell Ranger.
That said, dSP was designed to be as easy to operate as possible. It uses a modern pipeline management system called snakemake that gets rid of the tedious single command executions allowing standard processing in a reproducible and standardized manner. As a bonus, snakemake seamlessly integrates into high performance computing clusters or cloud services, which can come in handy for larger experiments. It is also surprisingly easy to set up.
dropSeqPipe and quality control of single-cell data
Aside from the ability to rapidly and reproducibly pre-process data, a bioinformatic pipeline requires a good quality control (QC) tool to separate the wheat from the chaff. While Cell Ranger provides QC for researchers to understand how well the experiment worked (some metrics are explained here), dSP delivers more extensive QC information using FAST-QC, a QC tool for high-throughput sequencing data.
All results created with FAST-QC are then collated within Multi-QC, a program that aggregates results from bioinformatics analyses across samples into a single report (Figure 1). Among the displayed parameters are sequence quality score (Phred Score), GC content, sequence length distribution and duplication level as well as overrepresented sequences and adapter contamination. Furthermore, kneeplots help researchers separate real single-cell transcriptomes from background data, while tools like Scrublet automatically detect and remove doublets.
DropSeqPipe is an open source pipeline!
This leads me to the next and probably biggest advantage of dSP, which is that it is an open-source software (CC-BY-SA license). In the world of science and single-cell research in particular, new protocols are developed at an incredibly fast rate. Having a bioinformatic tool that can keep up with these developments is essential.
Being open source, dSP is accessible for contributions from the user community. Other developers or users of the platform can suggest optimizations to the pipeline; allowing for the addition of new analysis or visualization tools, or ensuring that bugs are rectified swiftly. New analysis tools are being created to advance the analysis of single-cell data at almost the same speed new protocols are being developed. Changes can be quickly implemented by the developers ensuring that the pipeline always uses state-of-the-art analysis tools.
Generating graphs with dSP
Feeling like your head is spinning after all this talking about snakemake and FAST-QC? Here is even more info for you! After analysis the next step is graphical data output: plotting those hard-earned results in a beautiful graph. The dSP pipeline with all its tools is designed to provide a reproducible, almost automatic, workflow. One can, however, reach a point at which a specific experimental design requires manual intervention, for instance when generating graphs. This is usually the exciting bit and it cannot be automated as requirements are often specific to a researcher’s needs.
dSP produces output that is tailored for a quasi-standard data visualization software in the single-cell world called Seurat. Seurat is an R package enabling even more QC, analysis, and exploration of single-cell RNA-seq data. Similarly, to the analysis package used in Cell Ranger, it enables the user to identify cellular heterogeneity and to integrate diverse single-cell data. Using Seurat, data can be visualized in a range of plots such as Violin plots (Figure 2), heat maps or dimension reduction plots like PSA or tSNE. Furthermore, just recently a browser-based version of Seurat called SeuratWizard (Figure 3) was released that makes visualizing scRNA-Seq data easy and straightforward.
(Read our blog post Visualization of single cell data: From Seurat object to UMAP – An R tutorial to learn more)
dSP offers a greater set of parameters to play with compared to other commercial pipelines. Moreover, the biggest advantage of dSP is that it is open source, free of charge and has the ability to improve and develop at the same pace as new tools for single cell analysis are created. All in all, it’s a great tool and we very much like using it for our in-house data analysis at Dolomite Bio.
However, if you want to avoid any form of command line-based operations there is always Partek Flow, a very easy to use browser-based tool provided by Partek. Partek Flow guides users through the analysis of scRNA-Seq data using defined steps and parameters, from raw data to data visualization
So here you go, no matter what level of experience you have or what your analysis requirements are, there will always be a solution that meets your needs.
Enjoyed this blog post? Here are some of our other blogs: