STARSolo spliced/unspliced counts for non-10x libraries

Working with scRNAseq libraries produced by varying library chemistries can be challenging! Though STARSolo provides a way to obtain regular count matrics from SMARTSeq2 libraries, it currently does not support generating Spliced/Unspliced counts from any technologies other than 10x.

We have created these Snakemake workflows to address these limitations, which take trimmed individual cell fastq files from technologies other than 10x and convert them into 10x-like files by adding unique barcode and UMI sequences to bam files after alignment, and subsequently merge the individual cell bams and use STARSolo to produce count matrics for all/spliced/unspliced reads.

Included here are workflows for libraries produced with either single end and paired end sequencing.

Installation

Clone the repository:

 git clone https://github.com/haleymac/STARSolo_for_non10x

Install dependencies:

Create a conda env:

conda create -n starsolo_env
conda activate starsolo_env

Install dependencies

conda install STAR=2.7.11a, samtools=1.18, gatk=4.3.0.0, snakemake=7.18.2

Repository contents and to run workflow:

Run STARSolo workflow

The paired_end and single_end folders have both the snakemake workflows and config files needed to run the workflow on either paired end or single end libraries repectively

To run either the single or paired end workflows, fill out config.yaml with your system specific information. You will need:

a library id (of your choice)
the path to a directory containing individual cell fastq files
path to directory you would like the starsolo output in
path to your working directory you are running the workflow in
a list of cell ids - your fasts should be formatted either like {cellid}.fastq.gz if single end or {cellid}_1.fastq.gz and {cellid}_2.fastq.gz if paired end

Once your config.yaml is configured, and your conda environment is set up and ready, run the workflow like so: Adjust the number of cores to your systems available resources.

snakemake -s paired_end_starsolo_snakefile --cores 8

Or if you are running on a slurm cluster like I was call something like this specified to your system with its available resources:

snakemake \
    --snakefile paired_end_starsolo_snakefile \
    --jobs 2000 \
    --latency-wait 120 \
    --rerun-incomplete \
    --keep-going \
    --cluster '
     sbatch \
        --mem=100G \
        --cpus-per-task 8 \
        --time 5-0:0:0'

make_h5ad

Included also is make_h5ad.py which provides a command-line executable function to convert the STARSolo output of the workflow into an anndata h5ad object that can easily be integrated into a scanpy workflow.

After the workflow is complete, use this function like so:

python make_h5ad.py --solo_indir /path/to/STARSolo/output/directory --cell_barcodes_csv /path/to/barcodecsv/output/by/starsolo/snakefile --h5ad output/h5ad/file

Dependencies

These workflows were tested with:

STAR = 2.7.11a
samtools = 1.18
gatk = 4.3.0.0
snakemake = 7.18.2

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
paired_end		paired_end
single_end		single_end
10_barcode_whitelist.txt		10_barcode_whitelist.txt
README.md		README.md
make_h5ad.py		make_h5ad.py
starsolo_workflow_functions.py		starsolo_workflow_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STARSolo spliced/unspliced counts for non-10x libraries

Installation

Repository contents and to run workflow:

Run STARSolo workflow

make_h5ad

Dependencies

About

Releases

Packages

Languages

haleymac/STARSolo_for_non10x

Folders and files

Latest commit

History

Repository files navigation

STARSolo spliced/unspliced counts for non-10x libraries

Installation

Repository contents and to run workflow:

Run STARSolo workflow

make_h5ad

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages