Skip to content

Commit

Permalink
Merge pull request #430 from lilab-bcb/yiming
Browse files Browse the repository at this point in the history
Cellranger support TAR input and remove mkfastq step
  • Loading branch information
yihming authored Feb 17, 2025
2 parents f1055e6 + 4d25800 commit 219b857
Show file tree
Hide file tree
Showing 9 changed files with 304 additions and 650 deletions.
10 changes: 2 additions & 8 deletions docs/cellranger/feature_barcoding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -193,11 +193,11 @@ For feature barcoding data, ``cellranger_workflow`` takes Illumina outputs as in
- 0.1
- 0.1
* - cellranger_version
- cellranger version, could be: 9.0.0, 8.0.1, 8.0.0, 7.2.0, 7.1.0, 7.0.1, 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- cellranger version, could be: 9.0.0, 8.0.1, 8.0.0, 7.2.0, 7.1.0, 7.0.1, 7.0.0
- "9.0.0"
- "9.0.0"
* - cumulus_feature_barcoding_version
- Cumulus_feature_barcoding version for extracting feature barcode matrix. Version available: 0.11.4, 0.11.3, 0.11.2, 0.11.1, 0.11.0, 0.10.0, 0.9.0, 0.8.0, 0.7.0, 0.6.0, 0.5.0, 0.4.0, 0.3.0, 0.2.0.
- Cumulus_feature_barcoding version for extracting feature barcode matrix.
- "0.11.4"
- "0.11.4"
* - docker_registry
Expand All @@ -208,12 +208,6 @@ For feature barcoding data, ``cellranger_workflow`` takes Illumina outputs as in
- "cumulusprod" for backup images on Docker Hub.
- "quay.io/cumulus"
- "quay.io/cumulus"
* - mkfastq_docker_registry
- Docker registry to use for ``cellranger mkfastq``.
Default is the registry to which only Broad users have access.
See :ref:`bcl2fastq-docker` for making your own registry.
- "gcr.io/broad-cumulus"
- "gcr.io/broad-cumulus"
* - acronym_file
- | The link/path of an index file in TSV format for fetching preset genome references, chemistry whitelists, etc. by their names.
| Set an GS URI if *backend* is ``gcp``; an S3 URI for ``aws`` backend; an absolute file path for ``local`` backend.
Expand Down
27 changes: 9 additions & 18 deletions docs/cellranger/general_steps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,6 @@ Alternatively, users can submit jobs through command line interface (CLI) using
| If starts with FASTQ files, this should be Google bucket URLs of uploaded FASTQ folders.
| The FASTQ folders should contain one subfolder for each sample in the flowcell with the sample name as the subfolder name.
| Each subfolder contains FASTQ files for that sample.
* - **Lane**
-
| Tells which lanes the sample was pooled into.
| Can be either single lane (e.g. 8) or a range (e.g. 7-8) or all (e.g. \*).
* - **Index**
- Sample index (e.g. SI-GA-A12).
* - Chemistry
- Describes the 10x chemistry used for the sample. This column is optional.
* - DataType
Expand Down Expand Up @@ -108,15 +102,15 @@ Alternatively, users can submit jobs through command line interface (CLI) using

Example::

Sample,Reference,Flowcell,Lane,Index,Chemistry,DataType
sample_1,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,1-2,SI-GA-A8,threeprime,rna
sample_2,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,3-4,SI-GA-B8,SC3Pv3,rna
sample_3,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,5-6,SI-GA-C8,fiveprime,rna
sample_4,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,7-8,SI-GA-D8,fiveprime,rna
sample_1,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,1-2,SI-GA-A8,threeprime,rna
sample_2,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,3-4,SI-GA-B8,SC3Pv3,rna
sample_3,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,5-6,SI-GA-C8,fiveprime,rna
sample_4,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,7-8,SI-GA-D8,fiveprime,rna
Sample,Reference,Flowcell,Chemistry,DataType
sample_1,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,threeprime,rna
sample_2,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,SC3Pv3,rna
sample_3,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,fiveprime,rna
sample_4,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,fiveprime,rna
sample_1,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,threeprime,rna
sample_2,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,SC3Pv3,rna
sample_3,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,fiveprime,rna
sample_4,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,fiveprime,rna

**3.2 Upload your sample sheet to the workspace bucket:**

Expand Down Expand Up @@ -183,9 +177,6 @@ Alternatively, users can submit jobs through command line interface (CLI) using
* - Name
- Type
- Description
* - fastq_outputs
- Array[Array[String]?]
- The top-level array contains results (as arrays) for different data modalities. The inner-level array contains cloud locations of FASTQ files, one url per flowcell.
* - count_outputs
- Array[Array[String]?]
- The top-level array contains results (as arrays) for different data modalities. The inner-level array contains cloud locations of count matrices, one url per sample.
Expand Down
12 changes: 6 additions & 6 deletions docs/cellranger/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,17 @@ Feature barcoding assays (cell & nucleus hashing, CITE-seq and Perturb-seq)

---------------------------------

Single-cell ATAC-seq
^^^^^^^^^^^^^^^^^^^^
Single-cell immune profiling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. include:: sc_atac.rst
.. include:: sc_vdj.rst

---------------------------------

Single-cell immune profiling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Single-cell ATAC-seq
^^^^^^^^^^^^^^^^^^^^

.. include:: sc_vdj.rst
.. include:: sc_atac.rst

---------------------------------

Expand Down
24 changes: 0 additions & 24 deletions docs/cellranger/sc_atac.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,30 +19,6 @@ Sample sheet
- Mouse mm10, cellranger-arc/atac reference 2.0.0
* - **GRCh38_and_mm10-2020-A_atac_v2.0.0**
- Human GRCh38 and mouse mm10, cellranger-atac reference 2.0.0
* - **GRCh38_atac_v1.2.0**
- Human GRCh38, cellranger-atac reference 1.2.0
* - **mm10_atac_v1.2.0**
- Mouse mm10, cellranger-atac reference 1.2.0
* - **hg19_atac_v1.2.0**
- Human hg19, cellranger-atac reference 1.2.0
* - **b37_atac_v1.2.0**
- Human b37 build, cellranger-atac reference 1.2.0
* - **GRCh38_and_mm10_atac_v1.2.0**
- Human GRCh38 and mouse mm10, cellranger-atac reference 1.2.0
* - **hg19_and_mm10_atac_v1.2.0**
- Human hg19 and mouse mm10, cellranger-atac reference 1.2.0
* - **GRCh38_atac_v1.1.0**
- Human GRCh38, cellranger-atac reference 1.1.0
* - **mm10_atac_v1.1.0**
- Mouse mm10, cellranger-atac reference 1.1.0
* - **hg19_atac_v1.1.0**
- Human hg19, cellranger-atac reference 1.1.0
* - **b37_atac_v1.1.0**
- Human b37 build, cellranger-atac reference 1.1.0
* - **GRCh38_and_mm10_atac_v1.1.0**
- Human GRCh38 and mouse mm10, cellranger-atac reference 1.1.0
* - **hg19_and_mm10_atac_v1.1.0**
- Human hg19 and mouse mm10, cellranger-atac reference 1.1.0

#. **Index** column.

Expand Down
96 changes: 4 additions & 92 deletions docs/cellranger/sc_sn_rnaseq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,49 +25,6 @@ Sample sheet
- Mouse mm10 (GENCODE vM23/Ensembl 98)
* - **GRCh38_and_mm10-2020-A**
- Human GRCh38 (GENCODE v32/Ensembl 98) and mouse mm10 (GENCODE vM23/Ensembl 98)
* - **GRCh38_v3.0.0**
- Human GRCh38, cellranger reference 3.0.0, Ensembl v93 gene annotation
* - **hg19_v3.0.0**
- Human hg19, cellranger reference 3.0.0, Ensembl v87 gene annotation
* - **mm10_v3.0.0**
- Mouse mm10, cellranger reference 3.0.0, Ensembl v93 gene annotation
* - **GRCh38_and_mm10_v3.1.0**
- Human (GRCh38) and mouse (mm10), cellranger references 3.1.0, Ensembl v93 gene annotations for both human and mouse
* - **hg19_and_mm10_v3.0.0**
- Human (hg19) and mouse (mm10), cellranger reference 3.0.0, Ensembl v93 gene annotations for both human and mouse
* - **GRCh38_v1.2.0** or **GRCh38**
- Human GRCh38, cellranger reference 1.2.0, Ensembl v84 gene annotation
* - **hg19_v1.2.0** or **hg19**
- Human hg19, cellranger reference 1.2.0, Ensembl v82 gene annotation
* - **mm10_v1.2.0** or **mm10**
- Mouse mm10, cellranger reference 1.2.0, Ensembl v84 gene annotation
* - **GRCh38_and_mm10_v1.2.0** or **GRCh38_and_mm10**
- Human and mouse, built from GRCh38 and mm10 cellranger references, Ensembl v84 gene annotations are used
* - **GRCh38_and_SARSCoV2**
- Human GRCh38 and SARS-COV-2 RNA genome, cellranger reference 3.0.0, generated by `Carly Ziegler`_. The SARS-COV-2 viral sequence and gtf are as described in `[Kim et al. Cell 2020]`_ (https://github.com/hyeshik/sars-cov-2-transcriptome, BetaCov/South Korea/KCDC03/2020 based on NC_045512.2). The GTF was edited to include only CDS regions, and regions were added to describe the 5' UTR ("SARSCoV2_5prime"), the 3' UTR ("SARSCoV2_3prime"), and reads aligning to anywhere within the Negative Strand("SARSCoV2_NegStrand"). Additionally, trailing A's at the 3' end of the virus were excluded from the SARSCoV2 fasta, as these were found to drive spurious viral alignment in pre-COVID19 samples.

Pre-built snRNA-seq references are summarized below.

.. list-table::
:widths: 5 20
:header-rows: 1

* - Keyword
- Description
* - **GRCh38_premrna_v3.0.0**
- Human, introns included, built from GRCh38 cellranger reference 3.0.0, Ensembl v93 gene annotation, treating annotated transcripts as exons
* - **GRCh38_premrna_v1.2.0** or **GRCh38_premrna**
- Human, introns included, built from GRCh38 cellranger reference 1.2.0, Ensembl v84 gene annotation, treating annotated transcripts as exons
* - **mm10_premrna_v1.2.0** or **mm10_premrna**
- Mouse, introns included, built from mm10 cellranger reference 1.2.0, Ensembl v84 gene annotation, treating annotated transcripts as exons
* - **GRCh38_premrna_and_mm10_premrna_v1.2.0** or **GRCh38_premrna_and_mm10_premrna**
- Human and mouse, introns included, built from GRCh38_premrna_v1.2.0 and mm10_premrna_v1.2.0
* - **GRCh38_premrna_and_SARSCoV2**
- Human, introns included, built from GRCh38_premrna_v3.0.0, and SARS-COV-2 RNA genome. This reference was generated by `Carly Ziegler`_. The SARS-COV-2 RNA genome is from `[Kim et al. Cell 2020]`_ (https://github.com/hyeshik/sars-cov-2-transcriptome, BetaCov/South Korea/KCDC03/2020 based on NC_045512.2). Please see the description of *GRCh38_and_SARSCoV2* above for details.

#. **Index** column.

Put `10x single cell RNA-seq sample index set names`_ (e.g. SI-GA-A12) here.

#. *Chemistry* column.

Expand All @@ -85,22 +42,9 @@ Sample sheet
- Single Cell 3′
* - **fiveprime**
- Single Cell 5′
* - **SC3Pv1**
- Single Cell 3′ v1
* - **SC3Pv2**
- Single Cell 3′ v2
* - **SC3Pv3**
- Single Cell 3′ v3. You should set cellranger version input parameter to >= 3.0.2
* - **SC3Pv4**
- Single Cell 3' v4. **Notice:** This is GEM-X chemistry, and only works for Cell Ranger v8.0.0+
* - **SC5P-PE**
- Single Cell 5′ paired-end (both R1 and R2 are used for alignment)
* - **SC5P-PE-v3**
- Single Cell 5' paired-end v3 (both R1 and R2 are used for alignment). **Notice:** This is GEM-X chemistry, and only works for Cell Ranger v8.0.0+
* - **SC5P-R2**
- Single Cell 5′ R2-only (where only R2 is used for alignment)
* - **SC5P-R2-v3**
- Single Cell 5' R2-only v3 (where only R2 is used for alignment). **Notice:** This is GEM-X chemistry, and only works for Cell Rangrer v8.0.0+

#. *Flowcell* column.


#. *DataType* column.

Expand Down Expand Up @@ -140,38 +84,6 @@ For sc/snRNA-seq data, ``cellranger_workflow`` takes Illumina outputs as input a
- Output directory
- "gs://fc-e0000000-0000-0000-0000-000000000000/cellranger_output"
- Results are written under directory *output_directory* and will overwrite any existing files at this location.
* - run_mkfastq
- If you want to run ``cellranger mkfastq``
- true
- true
* - run_count
- If you want to run ``cellranger count``
- true
- true
* - delete_input_bcl_directory
- If delete BCL directories after demux. If false, you should delete this folder yourself so as to not incur storage charges
- false
- false
* - mkfastq_barcode_mismatches
- Number of mismatches allowed in matching barcode indices (bcl2fastq2 default is 1)
- 0
-
* - mkfastq_force_single_index
- If 10x-supplied i7/i5 paired indices are specified, but the flowcell was run with only one sample index, allow the demultiplex to proceed using the i7 half of the sample index pair
- false
- false
* - mkfastq_filter_single_index
- Only demultiplex samples identified by an i7-only sample index, ignoring dual-indexed samples. Dual-indexed samples will not be demultiplexed
- false
- false
* - mkfastq_use_bases_mask
- Override the read lengths as specified in *RunInfo.xml*
- "Y28n*,I8n*,N10,Y90n*"
-
* - mkfastq_delete_undetermined
- Delete undetermined FASTQ files generated by bcl2fastq2
- true
- false
* - force_cells
- Force pipeline to use this number of cells, bypassing the cell detection algorithm, mutually exclusive with expect_cells
- 6000
Expand All @@ -193,7 +105,7 @@ For sc/snRNA-seq data, ``cellranger_workflow`` takes Illumina outputs as input a
- false
- false
* - cellranger_version
- cellranger version, could be: 9.0.0, 8.0.1, 8.0.0, 7.2.0, 7.1.0, 7.0.1, 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- cellranger version, could be: 9.0.0, 8.0.1, 8.0.0, 7.2.0, 7.1.0, 7.0.1, 7.0.0
- "9.0.0"
- "9.0.0"
* - config_version
Expand Down
16 changes: 0 additions & 16 deletions docs/cellranger/sc_vdj.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,6 @@ Sample sheet
- Human GRCh38 V(D)J sequences, cellranger reference 7.0.0, annotation built from Ensembl *Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf*
* - **GRCm38_vdj_v7.0.0**
- Mouse GRCm38 V(D)J sequences, cellranger reference 7.0.0, annotation built from Ensembl *Mus_musculus.GRCm38.94.gtf*
* - **GRCh38_vdj_v5.0.0**
- Human GRCh38 V(D)J sequences, cellranger reference 5.0.0, annotation built from Ensembl *Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf*
* - **GRCm38_vdj_v5.0.0**
- Mouse GRCm38 V(D)J sequences, cellranger reference 5.0.0, annotation built from Ensembl *Mus_musculus.GRCm38.94.gtf*
* - **GRCh38_vdj_v4.0.0**
- Human GRCh38 V(D)J sequences, cellranger reference 4.0.0, annotation built from Ensembl *Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf*
* - **GRCm38_vdj_v4.0.0**
- Mouse GRCm38 V(D)J sequences, cellranger reference 4.0.0, annotation built from Ensembl *Mus_musculus.GRCm38.94.gtf*
* - **GRCh38_vdj_v3.1.0**
- Human GRCh38 V(D)J sequences, cellranger reference 3.1.0, annotation built from Ensembl *Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf*
* - **GRCm38_vdj_v3.1.0**
- Mouse GRCm38 V(D)J sequences, cellranger reference 3.1.0, annotation built from Ensembl *Mus_musculus.GRCm38.94.gtf*
* - **GRCh38_vdj_v2.0.0** or **GRCh38_vdj**
- Human GRCh38 V(D)J sequences, cellranger reference 2.0.0, annotation built from Ensembl *Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf* and *vdj_GRCh38_alts_ensembl_10x_genes-2.0.0.gtf*
* - **GRCm38_vdj_v2.2.0** or **GRCm38_vdj**
- Mouse GRCm38 V(D)J sequences, cellranger reference 2.2.0, annotation built from Ensembl *Mus_musculus.GRCm38.90.chr_patch_hapl_scaff.gtf*

#. **Index** column.

Expand Down
2 changes: 1 addition & 1 deletion docs/spaceranger.rst
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ For spatial data, ``spaceranger_workflow`` takes Illumina outputs and related im
- 50
-
* - spaceranger_version
- spaceranger version, could be: 3.1.2, 3.0.1, 3.0.0, 2.1.1, 2.0.1, 2.0.0, 1.3.1, 1.3.0
- spaceranger version, could be: 3.1.2, 3.0.1, 3.0.0
- "3.1.2"
- "3.1.2"
* - config_version
Expand Down
Loading

0 comments on commit 219b857

Please sign in to comment.