-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
55 changed files
with
5,070 additions
and
1,101 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,32 @@ | ||
# Welcome to scRNAbox's documentation! | ||
ScRNAbox is a single-cell RNA sequencing (scRNAseq) pipeline specifically designed for analyzing data under a High-Performance Computing (HPC) systems using the [Slurm Workload Manager](https://slurm.schedmd.com/). ScRNAbox provides two distinct, yet highly comparable Analysis Tracks: | ||
ScRNAbox is a single-cell RNA sequencing (scRNAseq) pipeline specifically designed for analyzing data under a High-Performance Computing (HPC) systems using the [Slurm Workload Manager](https://slurm.schedmd.com/). The scRNAbox pipeline incorporates nine Analytical Steps into a comprehensive scRNAseq analysis and provides the foundation for further investigations. The nine Analytical Steps are outlined below. | ||
|
||
<img src="https://github.com/neurobioinfo/scrnabox/assets/110110777/eccddd8e-4ea2-4c1e-9427-8ba40e6418ba" width="550" height="100"> | ||
|
||
The scRNAbox pipeline provides two distinct, yet highly comparable Analysis Tracks: | ||
|
||
1. **Standard scRNAseq** | ||
2. **Cell Hashtag scRNAseq** | ||
|
||
The Standard Analysis Track is designed for experiments where each sample is captured and sequenced separately, while the Cell Hashtag Analysis Track is designed for multiplexed experiments, whereby samples are tagged with sample-specific barcodes, pooled, and sequenced together. The Cell Hashtag Analysis Track is distinguished by an additional sample demultiplexing Step that assigns cells to their sample-of-origin via the sample-specific barcodes. | ||
The **Standard Analysis Track** is designed for experiments where each sample is captured and sequenced separately, while the **Cell Hashtag Analysis Track** is designed for multiplexed experiments, whereby samples are tagged with sample-specific barcodes, pooled, and sequenced together. The Cell Hashtag Analysis Track is distinguished by an additional sample demultiplexing Step that assigns cells to their sample-of-origin via the sample-specific barcodes. | ||
|
||
<img src="https://github.com/neurobioinfo/scrnabox/assets/110110777/3a6df83e-e104-45d2-9b04-fe246642c6a8" height="300"> | ||
|
||
For instructions on how to run each Analytical Step of the [Standard scRNAseq](SCRNA.md) and [Cell Hashtag scRNAseq](HTO.md) Analysis Track please see the respective tutorials. For a demonstration that leverages the datasets used as the application cases in the manuscript please see [Dataset1: Smajic et al.](Dataset1.md) and [Datset2: Stoeckius et al.](Dataset2.md) for the Standard scRNAseq and Cell Hashtag scRNAseq Analysis Track, respectively. | ||
For a comprehenseive description of each Analytical Step, please see [Standard Analysis Track](SCRNA.md) and [Cell Hashtag Analysis Track](HTO.md). <br/> | ||
|
||
For a tutorial that leverages the datasets used as the application cases in our pre-print manuscript, please see [Standard Analysis: Midbrain dataset](Dataset1.md) and [Cell Hashtag Analysis: PBMC dataset](Dataset2.md). | ||
|
||
- - - - | ||
|
||
## Contents | ||
- [Installation](installation.md) | ||
- [Tutorial:]() | ||
- [Standard scRNAseq](SCRNA.md) | ||
- [Cell Hashtag scRNAseq](HTO.md) | ||
- Overview: | ||
- [Standard Analysis Track](SCRNA.md) | ||
- [Cell Hashtag Analysis Track](HTO.md) | ||
- [Execution parameters](reference.md) | ||
- [Outputs](outputs.md) | ||
- Tutorial | ||
- [Standard Analysis Track: Midbrain dataset](Dataset1.md) | ||
- [Cell Hashtag Analysis Track: PBMC dataset](Dataset2.md) | ||
- [Processed Data](PROC.md) | ||
- [Dataset1: Smajic et al.](Dataset1.md) | ||
- [Datset2: Stoeckius et al.](Dataset2.md) | ||
- [FAQ](FAQ.md) | ||
- [Reference](reference.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
Finally, in preparation for Step 1 (FASTQ pre-processing with CellRanger) users must create `library.csv` and `feature_ref.csv` files for each of their sequencing runs.<br /> | ||
|
||
#### library.csv | ||
The `library.csv` file defines the necessary information of the FASTQ files for the experiment, including the gene expression and antibody assays. The structure of the `library.csv` file should be: <br /> | ||
``` | ||
fastqs,sample,library_type | ||
~/fastqs/,RUN1GEX,Gene Expression | ||
~/fastqs/,RUN1HTO,Antibody Capture | ||
``` | ||
- The `fastqs` column defines the path to the directory that contains the FASTQ files for the experiment. <br /> | ||
- The `sample` column defines the sample name of the corresponding FASTQ file. Please note that FASTQ files must be named according to standard CellRanger nomenclature. For example, "CTRL1_S1_L001_R1_001.fastq". For more information please visit CellRanger's [documentation](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/fastq-input). <br /> | ||
- The `library_type` column defines the assay type. For the Cell Hashtag Analysis track, each sequencing run should have a "Gene Expression" and "Antibody Capture" assay. For more information, please visit CellRanger's [documentation]("https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis") <br /> | ||
|
||
For example, if the experiment comprises three sequencing runs the following steps should be taken: <br /> | ||
|
||
1) Navigate to the working directory and create a `samples_info` folder: <br /> | ||
``` | ||
cd ~/working_directory | ||
mkdir samples_info | ||
``` | ||
2) Navigate to the `samples_info` folder and create a folder for each sequencing run: <br /> | ||
``` | ||
cd samples_info | ||
mkdir run1 | ||
mkdir run2 | ||
mkdir run3 | ||
``` | ||
3) Navigate to the folder for each sequencing and create the `library.csv` file. <br /> | ||
|
||
After performing steps 1-3 above, the structure of the samples_info folder for an experiment with three sequencing runs should be: | ||
``` | ||
├── working_directory | ||
├── samples_info | ||
├── run1 | ||
├── library.csv | ||
├── run2 | ||
├── library.csv | ||
├── run3 | ||
├── library.csv | ||
``` | ||
#### feature_ref.csv | ||
The `feature_ref.csv` file defines the necessary information for processing the sample-specific barcodes that will eventually be used to demultiplex the pooled samples. For example, if there are four samples pooled together with four unique barcode identifiers, the structure of the `feature_ref.csv` file should be: | ||
``` | ||
id,name,read,pattern,sequence,feature_type | ||
Hash1,B0251_TotalSeqB,R2,5PNNNNNNNNNN(BC),GTCAACTCTTTAGCG,Antibody Capture | ||
Hash2,B0252_TotalSeqB,R2,5PNNNNNNNNNN(BC),TGATGGCCTATTGGG,Antibody Capture | ||
Hash3,B0253_TotalSeqB,R2,5PNNNNNNNNNN(BC),TTCCGCCTCTCTTTG,Antibody Capture | ||
Hash4,B0254_TotalSeqB,R2,5PNNNNNNNNNN(BC),AGTAAGTTCAGCGTA,Antibody Capture | ||
``` | ||
- The `id` column defines the barcode ID which will be used to track the feature counts. <br /> | ||
- The `name` column defines the arbitrary name for the barcode identifier. <br /> | ||
- The `read` column defines which RNA sequencing read contains the barcode sequence. This value Will be either R1 or R2.<br /> | ||
- The `pattern` column defines the pattern of the barcode identifiers. For more information please visit the 10X Genomics [documentation](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#pattern)<br /> | ||
- The `sequence` column defines nucleotide sequence associated with the barcode identifier.<br /> | ||
- The `feature_type` column defines the type of feature used for sample identification. Please ensure that the feature_type in the `feature_ref.csv` file matches a library_type in the `library.csv` file. <br /> | ||
|
||
For more information regarding the preparation of the `feature_ref.csv`, please see CellRanger's [documentation](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis). | ||
|
||
`feature_ref.csv` files can be prepared the same way as the `library.csv` files. After producing the `feature_ref.csv` for each sequncing run, the structure of the samples_info folder for an experiment with three sequencing runs should be: | ||
``` | ||
├── working_directory | ||
├── samples_info | ||
├── run1 | ||
├── library.csv | ||
├── feature_ref.csv | ||
├── run2 | ||
├── library.csv | ||
├── feature_ref.csv | ||
├── run3 | ||
├── library.csv | ||
├── feature_ref.csv | ||
``` |
Oops, something went wrong.