Skip to content

To produce STARSolo spliced/unspliced counts from non-10x scRNAseq libraries

Notifications You must be signed in to change notification settings

haleymac/STARSolo_for_non10x

Repository files navigation

STARSolo spliced/unspliced counts for non-10x libraries

Working with scRNAseq libraries produced by varying library chemistries can be challenging! Though STARSolo provides a way to obtain regular count matrics from SMARTSeq2 libraries, it currently does not support generating Spliced/Unspliced counts from any technologies other than 10x.

We have created these Snakemake workflows to address these limitations, which take trimmed individual cell fastq files from technologies other than 10x and convert them into 10x-like files by adding unique barcode and UMI sequences to bam files after alignment, and subsequently merge the individual cell bams and use STARSolo to produce count matrics for all/spliced/unspliced reads.

Included here are workflows for libraries produced with either single end and paired end sequencing.

Installation

  1. Clone the repository:
 git clone https://github.com/haleymac/STARSolo_for_non10x
  1. Install dependencies:

Create a conda env:

conda create -n starsolo_env
conda activate starsolo_env

Install dependencies

conda install STAR=2.7.11a, samtools=1.18, gatk=4.3.0.0, snakemake=7.18.2

Repository contents and to run workflow:

Run STARSolo workflow

The paired_end and single_end folders have both the snakemake workflows and config files needed to run the workflow on either paired end or single end libraries repectively

To run either the single or paired end workflows, fill out config.yaml with your system specific information. You will need:

  • a library id (of your choice)
  • the path to a directory containing individual cell fastq files
  • path to directory you would like the starsolo output in
  • path to your working directory you are running the workflow in
  • a list of cell ids - your fasts should be formatted either like {cellid}.fastq.gz if single end or {cellid}_1.fastq.gz and {cellid}_2.fastq.gz if paired end

Once your config.yaml is configured, and your conda environment is set up and ready, run the workflow like so: Adjust the number of cores to your systems available resources.

snakemake -s paired_end_starsolo_snakefile --cores 8

Or if you are running on a slurm cluster like I was call something like this specified to your system with its available resources:

snakemake \
    --snakefile paired_end_starsolo_snakefile \
    --jobs 2000 \
    --latency-wait 120 \
    --rerun-incomplete \
    --keep-going \
    --cluster '
     sbatch \
        --mem=100G \
        --cpus-per-task 8 \
        --time 5-0:0:0'
make_h5ad

Included also is make_h5ad.py which provides a command-line executable function to convert the STARSolo output of the workflow into an anndata h5ad object that can easily be integrated into a scanpy workflow.

After the workflow is complete, use this function like so:

python make_h5ad.py --solo_indir /path/to/STARSolo/output/directory --cell_barcodes_csv /path/to/barcodecsv/output/by/starsolo/snakefile --h5ad output/h5ad/file

Dependencies

These workflows were tested with:

  • STAR = 2.7.11a
  • samtools = 1.18
  • gatk = 4.3.0.0
  • snakemake = 7.18.2

About

To produce STARSolo spliced/unspliced counts from non-10x scRNAseq libraries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages