Skip to content

Commit

Permalink
Add Fixed RNA Profiling to Cellranger workflow (#365)
Browse files Browse the repository at this point in the history
* update index

* update cellranger_multi with Fixed RNA Profiling

* upgrade default cellranger versions in cellranger WDLs

* update docs
  • Loading branch information
yihming authored Oct 31, 2022
1 parent 1183afc commit 88cd9c9
Show file tree
Hide file tree
Showing 13 changed files with 356 additions and 60 deletions.
12 changes: 6 additions & 6 deletions docs/cellranger/build_refs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,9 @@ We provide a wrapper of ``cellranger mkref`` to build sc/snRNA-seq references. P
- Ensembl v94
-
* - cellranger_version
- cellranger version, could be: ``7.0.0``, ``6.1.2``, ``6.1.1``
- "7.0.0"
- "7.0.0"
- cellranger version, could be: 7.0.1, 7.0.0, 6.1.2, 6.1.1
- "7.0.1"
- "7.0.1"
* - docker_registry
- Docker registry to use for cellranger_workflow. Options:

Expand Down Expand Up @@ -320,9 +320,9 @@ We provide a wrapper of ``cellranger mkvdjref`` to build single-cell immune prof
- Ensembl v94
-
* - cellranger_version
- cellranger version, could be: 7.0.0, 6.1.2, 6.1.1
- "7.0.0"
- "7.0.0"
- cellranger version, could be: 7.0.1, 7.0.0, 6.1.2, 6.1.1
- "7.0.1"
- "7.0.1"
* - docker_registry
- Docker registry to use for cellranger_workflow. Options:

Expand Down
6 changes: 3 additions & 3 deletions docs/cellranger/feature_barcoding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,9 @@ For feature barcoding data, ``cellranger_workflow`` takes Illumina outputs as in
- 0.1
- 0.1
* - cellranger_version
- cellranger version, could be 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- "7.0.0"
- "7.0.0"
- cellranger version, could be 7.0.1, 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- "7.0.1"
- "7.0.1"
* - cumulus_feature_barcoding_version
- Cumulus_feature_barcoding version for extracting feature barcode matrix. Version available: 0.11.0, 0.10.0, 0.9.0, 0.8.0, 0.7.0, 0.6.0, 0.5.0, 0.4.0, 0.3.0, 0.2.0.
- "0.11.0"
Expand Down
214 changes: 214 additions & 0 deletions docs/cellranger/fixed_rna_profiling.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
Cellranger multi supports `Fixed RNA Profiling`_ since version 7.0.0.

Sample Sheet
++++++++++++++

#. **Reference** column.

Prebuilt scRNA-seq references for FRP data processing are summarized below.

.. list-table::
:widths: 5 20
:header-rows: 1

* - Keyword
- Description
* - **GRCh38-2020-A**
- Human GRCh38 (GENCODE v32/Ensembl 98)

#. *DataType* column.

Set ``frp`` for RNA-Seq modalities of your FRP samples. For other modalities (e.g. citeseq or antibody), set to their corresponding data types.

#. *ProbeSet* column.

Preset probe set references for FRP samples:

.. list-table::
:widths: 5 20
:header-rows: 1

* - Keyword
- Description
* - **FRP_human_probe_v1**
- FRP probe set for human

If *ProbeSet* column is not set, use **FRP_human_probe_v1** by default.

#. *FeatureBarcodeFile* column.

Provide sample name - Probe Barcode association as follows::

sample1,BC001|BC002,Control
sample2,BC003|BC004,Treated

where the third column (i.e. ``Control`` and ``Treated`` above) is optional, which specifies the description of the samples.

#. *Link* column.

Put a sample unique link name for all modalities that are linked.

If *Link* column is not set, only consider RNA-seq modalities (i.e. samples of *DataType* ``frp``) and use their *Sample* names as the *Link* names.

#. Example::

Sample,Reference,ProbeSet,Flowcell,DataType,FeatureBarcodeFile,Link
sample1,GRCh38-2020-A,FRP_human_probe_v1,/path/to/sample1/fastq/folder,frp,/path/to/sample1/fbf/file,
sample2_rna,GRCh38-2020-A,FRP_human_probe_v1,/path/to/sample2/rna/fastq/folder,frp,/path/to/sample2/rna/fbf/file,sample2
sample2_citeseq,GRCh38-2020-A,,/path/to/sample2/citeseq/fastq/folder,citeseq,/path/to/sample2/citeseq/fbf/file,sample2

In the example above, two linked samples are provided.


Workflow Input
++++++++++++++++

For FRP data, ``cellranger_workflow`` takes Illumina outputs as input and runs ``cellranger mkfastq`` and ``cellranger multi``. Revalant workflow inputs are described below, with required inputs highlighted in **bold**:

.. list-table::
:widths: 5 30 30 20
:header-rows: 1

* - Name
- Description
- Example
- Default
* - **input_csv_file**
- Sample Sheet (contains Sample, Reference, DataType, Flowcell as required; Lane and Index are required if *run_mkfastq* is ``true``; ProbeSet, FeatureBarcodeFile and Link are optional)
- "gs://fc-e0000000-0000-0000-0000-000000000000/sample_sheet.csv"
-
* - **output_directory**
- Output directory
- "gs://fc-e0000000-0000-0000-0000-000000000000/cellranger_output"
-
* - run_mkfastq
- If you want to run ``cellranger mkfastq``
- true
- true
* - run_count
- If you want to run ``cellranger multi``
- true
- true
* - delete_input_bcl_directory
- If delete BCL directories after demux. If false, you should delete this folder yourself so as to not incur storage charges
- false
- false
* - mkfastq_barcode_mismatches
- Number of mismatches allowed in matching barcode indices (bcl2fastq2 default is 1)
- 0
- 1
* - mkfastq_force_single_index
- If 10x-supplied i7/i5 paired indices are specified, but the flowcell was run with only one sample index, allow the demultiplex to proceed using the i7 half of the sample index pair
- false
- false
* - mkfastq_filter_single_index
- Only demultiplex samples identified by an i7-only sample index, ignoring dual-indexed samples. Dual-indexed samples will not be demultiplexed
- false
- false
* - mkfastq_use_bases_mask
- Override the read lengths as specified in RunInfo.xml
- "“Y28n*,I8n*,N10,Y90n*”"
-
* - mkfastq_delete_undetermined
- Delete undetermined FASTQ files generated by bcl2fastq2
- false
- false
* - force_cells
- Force pipeline to use this number of cells, bypassing the cell detection algorithm, mutually exclusive with expect_cells. This option is used by ``cellranger multi``.
- 6000
-
* - expect_cells
- Expected number of recovered cells. Mutually exclusive with force_cells. This option is used by ``cellranger multi``.
- 3000
-
* - include_introns
- Turn this option on to also count reads mapping to intronic regions. With this option, users do not need to use pre-mRNA references. Note that if this option is set, cellranger_version must be >= 5.0.0. This option is used by ``cellranger multi``.
- true
- true
* - no_bam
- Turn this option on to disable BAM file generation. This option is only available if cellranger_version >= 5.0.0. This option is used by ``cellranger multi``.
- false
- false
* - secondary
- Perform Cell Ranger secondary analysis (dimensionality reduction, clustering, etc.). This option is used by ``cellranger multi``.
- false
- false
* - cellranger_version
- Cell Ranger version to use. Available versions working for FRP data: 7.0.1, 7.0.0.
- "7.0.1"
- "7.0.1"
* - docker_registry
- Docker registry to use for cellranger_workflow. Options:

- "quay.io/cumulus" for images on Red Hat registry;

- "cumulusprod" for backup images on Docker Hub.
- "quay.io/cumulus"
- "quay.io/cumulus"
* - mkfastq_docker_registry
- Docker registry to use for ``cellranger mkfastq``. Default is the registry to which only Broad users have access. See :ref:`bcl2fastq-docker` for making your own registry.
- "gcr.io/broad-cumulus"
- "gcr.io/broad-cumulus"
* - acronym_file
- | The link/path of an index file in TSV format for fetching preset genome references, probe set references, chemistry whitelists, etc. by their names.
| Set an GS URI if *backend* is ``gcp``; an S3 URI for ``aws`` backend; an absolute file path for ``local`` backend.
- "s3://xxxx/index.tsv"
- "gs://regev-lab/resources/cellranger/index.tsv"
* - zones
- Google cloud zones
- "us-central1-a us-west1-a"
- "us-central1-a us-central1-b us-central1-c us-central1-f us-east1-b us-east1-c us-east1-d us-west1-a us-west1-b us-west1-c"
* - num_cpu
- Number of cpus to request for one node for cellranger mkfastq and cellranger multi
- 32
- 32
* - memory
- Memory size string for cellranger mkfastq and cellranger multi
- "120G"
- "120G"
* - mkfastq_disk_space
- Optional disk space in GB for mkfastq
- 1500
- 1500
* - count_disk_space
- Disk space in GB needed for cellranger multi
- 500
- 500
* - backend
- Cloud backend for file transfer and computation. Available options:

- "gcp" for Google Cloud;
- "aws" for Amazon AWS;
- "local" for local machines.
- "gcp"
- "gcp"
* - preemptible
- Number of preemptible tries
- 2
- 2
* - awsQueueArn
- The AWS ARN string of the job queue to be used. This only works for ``aws`` backend.
- "arn:aws:batch:us-east-1:xxx:job-queue/priority-gwf"
- ""

Workflow Output
+++++++++++++++++

See the table below for important outputs:

.. list-table::
:widths: 5 5 10
:header-rows: 1

* - Name
- Type
- Description
* - fastq_outputs
- Array[Array[String]]
- ``fastq_outputs[0]`` gives the list of cloud urls containing FASTQ files for RNA-Seq modalities of FRP data, one url per flowcell.
* - count_outputs
- Map[String, Array[String]]
- ``count_outputs["multi"]`` gives the list of cloud urls containing *cellranger multi* outputs, one url per sample.


.. _Fixed RNA Profiling: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/multi-frp
5 changes: 4 additions & 1 deletion docs/cellranger/general_steps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,10 @@ Alternatively, users can submit jobs through command line interface (CLI) using
| **cmo** refers to cell multiplexing oligos used in 10x Genomics' CellPlex assay,
| **crispr** refers to Perturb-seq guide tag data,
| **atac** refers to scATAC-Seq data (*cellranger-atac count*),
| **frp** refers to Fixed RNA Profiling (FRP) gene expression data,
| This column is optional and the default data type is *rna*.
* - ProbeSet
- Probe set reference for FRP samples. Currently ``FRP_human_probe_v1`` is the only available and thus the default reference. Only works for samples of *DataType* ``frp``.
* - FeatureBarcodeFile
-
| Google bucket urls pointing to feature barcode files for *rna*, *citeseq*, *hashing*, *cmo* and *crispr* data.
Expand All @@ -94,7 +97,7 @@ Alternatively, users can submit jobs through command line interface (CLI) using
| This column is only required for targeted gene expression analysis (*rna*), CITE-Seq (*citeseq*), cell-hashing or nucleus-hashing (*hashing*), CellPlex (*cmo*) and Perturb-seq (*crispr*).
* - Link
-
| Designed for Single Cell Multiome ATAC + Gene Expression, Feature Barcoding, or CellPlex.
| Designed for Single Cell Multiome ATAC + Gene Expression, Feature Barcoding, CellPlex, or FRP.
| Link multiple modalities together using a single link name.
| cellranger-arc count, cellranger count, or cellranger multi will be triggered automatically depending on the modalities.
| If empty string is provided, no link is assumed.
Expand Down
7 changes: 7 additions & 0 deletions docs/cellranger/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,13 @@ Single-cell multiomics

.. include:: sc_multiomics.rst

--------------------------

Fixed RNA Profiling
^^^^^^^^^^^^^^^^^^^^

.. include:: fixed_rna_profiling.rst

---------------------------------

Build Cell Ranger References
Expand Down
12 changes: 6 additions & 6 deletions docs/cellranger/sc_multiomics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,13 +151,13 @@ For single-cell multiomics data, ``cellranger_workflow`` takes Illumina outputs
- "gs://fc-e0000000-0000-0000-0000-000000000000/cmo_set.csv"
-
* - cellranger_version
- cellranger version, could be 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- "7.0.0"
- "7.0.0"
- cellranger version, could be 7.0.1, 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- "7.0.1"
- "7.0.1"
* - cellranger_arc_version
- cellranger-arc version, could be 2.0.1, 2.0.0, 1.0.1, 1.0.0
- "2.0.1"
- "2.0.1"
- cellranger-arc version, could be 2.0.2, 2.0.1, 2.0.0, 1.0.1, 1.0.0
- "2.0.2"
- "2.0.2"
* - docker_registry
- Docker registry to use for cellranger_workflow. Options:

Expand Down
12 changes: 6 additions & 6 deletions docs/cellranger/sc_sn_rnaseq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,13 +181,13 @@ For sc/snRNA-seq data, ``cellranger_workflow`` takes Illumina outputs as input a
- false
- false
* - cellranger_version
- cellranger version, could be: 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- "7.0.0"
- "7.0.0"
- cellranger version, could be: 7.0.1, 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- "7.0.1"
- "7.0.1"
* - config_version
- config docker version used for processing sample sheets, could be 0.2, 0.1
- "0.2"
- "0.2"
- config docker version used for processing sample sheets, could be 0.3, 0.2, 0.1
- "0.3"
- "0.3"
* - docker_registry
- Docker registry to use for cellranger_workflow. Options:

Expand Down
6 changes: 3 additions & 3 deletions docs/cellranger/sc_vdj.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,9 +123,9 @@ For scIR-seq data, ``cellranger_workflow`` takes Illumina outputs as input and r
- "auto"
- "auto"
* - cellranger_version
- cellranger version, could be 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- "7.0.0"
- "7.0.0"
- cellranger version, could be 7.0.1, 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- "7.0.1"
- "7.0.1"
* - docker_registry
- Docker registry to use for cellranger_workflow. Options:

Expand Down
4 changes: 2 additions & 2 deletions workflows/cellranger/cellranger_create_reference.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ workflow cellranger_create_reference {

# Which docker registry to use
String docker_registry = "quay.io/cumulus"
# 7.0.0, 6.1.2, 6.1.1
String cellranger_version = "7.0.0"
# 7.0.1, 7.0.0, 6.1.2, 6.1.1
String cellranger_version = "7.0.1"

# Disk space in GB
Int disk_space = 100
Expand Down
Loading

0 comments on commit 88cd9c9

Please sign in to comment.