Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cellranger support TAR input and remove mkfastq step #430

Merged
merged 7 commits into from
Feb 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 2 additions & 8 deletions docs/cellranger/feature_barcoding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -193,11 +193,11 @@ For feature barcoding data, ``cellranger_workflow`` takes Illumina outputs as in
- 0.1
- 0.1
* - cellranger_version
- cellranger version, could be: 9.0.0, 8.0.1, 8.0.0, 7.2.0, 7.1.0, 7.0.1, 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- cellranger version, could be: 9.0.0, 8.0.1, 8.0.0, 7.2.0, 7.1.0, 7.0.1, 7.0.0
- "9.0.0"
- "9.0.0"
* - cumulus_feature_barcoding_version
- Cumulus_feature_barcoding version for extracting feature barcode matrix. Version available: 0.11.4, 0.11.3, 0.11.2, 0.11.1, 0.11.0, 0.10.0, 0.9.0, 0.8.0, 0.7.0, 0.6.0, 0.5.0, 0.4.0, 0.3.0, 0.2.0.
- Cumulus_feature_barcoding version for extracting feature barcode matrix.
- "0.11.4"
- "0.11.4"
* - docker_registry
Expand All @@ -208,12 +208,6 @@ For feature barcoding data, ``cellranger_workflow`` takes Illumina outputs as in
- "cumulusprod" for backup images on Docker Hub.
- "quay.io/cumulus"
- "quay.io/cumulus"
* - mkfastq_docker_registry
- Docker registry to use for ``cellranger mkfastq``.
Default is the registry to which only Broad users have access.
See :ref:`bcl2fastq-docker` for making your own registry.
- "gcr.io/broad-cumulus"
- "gcr.io/broad-cumulus"
* - acronym_file
- | The link/path of an index file in TSV format for fetching preset genome references, chemistry whitelists, etc. by their names.
| Set an GS URI if *backend* is ``gcp``; an S3 URI for ``aws`` backend; an absolute file path for ``local`` backend.
Expand Down
27 changes: 9 additions & 18 deletions docs/cellranger/general_steps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,6 @@ Alternatively, users can submit jobs through command line interface (CLI) using
| If starts with FASTQ files, this should be Google bucket URLs of uploaded FASTQ folders.
| The FASTQ folders should contain one subfolder for each sample in the flowcell with the sample name as the subfolder name.
| Each subfolder contains FASTQ files for that sample.
* - **Lane**
-
| Tells which lanes the sample was pooled into.
| Can be either single lane (e.g. 8) or a range (e.g. 7-8) or all (e.g. \*).
* - **Index**
- Sample index (e.g. SI-GA-A12).
* - Chemistry
- Describes the 10x chemistry used for the sample. This column is optional.
* - DataType
Expand Down Expand Up @@ -108,15 +102,15 @@ Alternatively, users can submit jobs through command line interface (CLI) using

Example::

Sample,Reference,Flowcell,Lane,Index,Chemistry,DataType
sample_1,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,1-2,SI-GA-A8,threeprime,rna
sample_2,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,3-4,SI-GA-B8,SC3Pv3,rna
sample_3,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,5-6,SI-GA-C8,fiveprime,rna
sample_4,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,7-8,SI-GA-D8,fiveprime,rna
sample_1,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,1-2,SI-GA-A8,threeprime,rna
sample_2,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,3-4,SI-GA-B8,SC3Pv3,rna
sample_3,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,5-6,SI-GA-C8,fiveprime,rna
sample_4,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,7-8,SI-GA-D8,fiveprime,rna
Sample,Reference,Flowcell,Chemistry,DataType
sample_1,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,threeprime,rna
sample_2,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,SC3Pv3,rna
sample_3,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,fiveprime,rna
sample_4,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4,fiveprime,rna
sample_1,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,threeprime,rna
sample_2,GRCh38-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,SC3Pv3,rna
sample_3,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,fiveprime,rna
sample_4,mm10-2020-A,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2,fiveprime,rna

**3.2 Upload your sample sheet to the workspace bucket:**

Expand Down Expand Up @@ -183,9 +177,6 @@ Alternatively, users can submit jobs through command line interface (CLI) using
* - Name
- Type
- Description
* - fastq_outputs
- Array[Array[String]?]
- The top-level array contains results (as arrays) for different data modalities. The inner-level array contains cloud locations of FASTQ files, one url per flowcell.
* - count_outputs
- Array[Array[String]?]
- The top-level array contains results (as arrays) for different data modalities. The inner-level array contains cloud locations of count matrices, one url per sample.
Expand Down
12 changes: 6 additions & 6 deletions docs/cellranger/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,17 @@ Feature barcoding assays (cell & nucleus hashing, CITE-seq and Perturb-seq)

---------------------------------

Single-cell ATAC-seq
^^^^^^^^^^^^^^^^^^^^
Single-cell immune profiling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. include:: sc_atac.rst
.. include:: sc_vdj.rst

---------------------------------

Single-cell immune profiling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Single-cell ATAC-seq
^^^^^^^^^^^^^^^^^^^^

.. include:: sc_vdj.rst
.. include:: sc_atac.rst

---------------------------------

Expand Down
24 changes: 0 additions & 24 deletions docs/cellranger/sc_atac.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,30 +19,6 @@ Sample sheet
- Mouse mm10, cellranger-arc/atac reference 2.0.0
* - **GRCh38_and_mm10-2020-A_atac_v2.0.0**
- Human GRCh38 and mouse mm10, cellranger-atac reference 2.0.0
* - **GRCh38_atac_v1.2.0**
- Human GRCh38, cellranger-atac reference 1.2.0
* - **mm10_atac_v1.2.0**
- Mouse mm10, cellranger-atac reference 1.2.0
* - **hg19_atac_v1.2.0**
- Human hg19, cellranger-atac reference 1.2.0
* - **b37_atac_v1.2.0**
- Human b37 build, cellranger-atac reference 1.2.0
* - **GRCh38_and_mm10_atac_v1.2.0**
- Human GRCh38 and mouse mm10, cellranger-atac reference 1.2.0
* - **hg19_and_mm10_atac_v1.2.0**
- Human hg19 and mouse mm10, cellranger-atac reference 1.2.0
* - **GRCh38_atac_v1.1.0**
- Human GRCh38, cellranger-atac reference 1.1.0
* - **mm10_atac_v1.1.0**
- Mouse mm10, cellranger-atac reference 1.1.0
* - **hg19_atac_v1.1.0**
- Human hg19, cellranger-atac reference 1.1.0
* - **b37_atac_v1.1.0**
- Human b37 build, cellranger-atac reference 1.1.0
* - **GRCh38_and_mm10_atac_v1.1.0**
- Human GRCh38 and mouse mm10, cellranger-atac reference 1.1.0
* - **hg19_and_mm10_atac_v1.1.0**
- Human hg19 and mouse mm10, cellranger-atac reference 1.1.0

#. **Index** column.

Expand Down
96 changes: 4 additions & 92 deletions docs/cellranger/sc_sn_rnaseq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,49 +25,6 @@ Sample sheet
- Mouse mm10 (GENCODE vM23/Ensembl 98)
* - **GRCh38_and_mm10-2020-A**
- Human GRCh38 (GENCODE v32/Ensembl 98) and mouse mm10 (GENCODE vM23/Ensembl 98)
* - **GRCh38_v3.0.0**
- Human GRCh38, cellranger reference 3.0.0, Ensembl v93 gene annotation
* - **hg19_v3.0.0**
- Human hg19, cellranger reference 3.0.0, Ensembl v87 gene annotation
* - **mm10_v3.0.0**
- Mouse mm10, cellranger reference 3.0.0, Ensembl v93 gene annotation
* - **GRCh38_and_mm10_v3.1.0**
- Human (GRCh38) and mouse (mm10), cellranger references 3.1.0, Ensembl v93 gene annotations for both human and mouse
* - **hg19_and_mm10_v3.0.0**
- Human (hg19) and mouse (mm10), cellranger reference 3.0.0, Ensembl v93 gene annotations for both human and mouse
* - **GRCh38_v1.2.0** or **GRCh38**
- Human GRCh38, cellranger reference 1.2.0, Ensembl v84 gene annotation
* - **hg19_v1.2.0** or **hg19**
- Human hg19, cellranger reference 1.2.0, Ensembl v82 gene annotation
* - **mm10_v1.2.0** or **mm10**
- Mouse mm10, cellranger reference 1.2.0, Ensembl v84 gene annotation
* - **GRCh38_and_mm10_v1.2.0** or **GRCh38_and_mm10**
- Human and mouse, built from GRCh38 and mm10 cellranger references, Ensembl v84 gene annotations are used
* - **GRCh38_and_SARSCoV2**
- Human GRCh38 and SARS-COV-2 RNA genome, cellranger reference 3.0.0, generated by `Carly Ziegler`_. The SARS-COV-2 viral sequence and gtf are as described in `[Kim et al. Cell 2020]`_ (https://github.com/hyeshik/sars-cov-2-transcriptome, BetaCov/South Korea/KCDC03/2020 based on NC_045512.2). The GTF was edited to include only CDS regions, and regions were added to describe the 5' UTR ("SARSCoV2_5prime"), the 3' UTR ("SARSCoV2_3prime"), and reads aligning to anywhere within the Negative Strand("SARSCoV2_NegStrand"). Additionally, trailing A's at the 3' end of the virus were excluded from the SARSCoV2 fasta, as these were found to drive spurious viral alignment in pre-COVID19 samples.

Pre-built snRNA-seq references are summarized below.

.. list-table::
:widths: 5 20
:header-rows: 1

* - Keyword
- Description
* - **GRCh38_premrna_v3.0.0**
- Human, introns included, built from GRCh38 cellranger reference 3.0.0, Ensembl v93 gene annotation, treating annotated transcripts as exons
* - **GRCh38_premrna_v1.2.0** or **GRCh38_premrna**
- Human, introns included, built from GRCh38 cellranger reference 1.2.0, Ensembl v84 gene annotation, treating annotated transcripts as exons
* - **mm10_premrna_v1.2.0** or **mm10_premrna**
- Mouse, introns included, built from mm10 cellranger reference 1.2.0, Ensembl v84 gene annotation, treating annotated transcripts as exons
* - **GRCh38_premrna_and_mm10_premrna_v1.2.0** or **GRCh38_premrna_and_mm10_premrna**
- Human and mouse, introns included, built from GRCh38_premrna_v1.2.0 and mm10_premrna_v1.2.0
* - **GRCh38_premrna_and_SARSCoV2**
- Human, introns included, built from GRCh38_premrna_v3.0.0, and SARS-COV-2 RNA genome. This reference was generated by `Carly Ziegler`_. The SARS-COV-2 RNA genome is from `[Kim et al. Cell 2020]`_ (https://github.com/hyeshik/sars-cov-2-transcriptome, BetaCov/South Korea/KCDC03/2020 based on NC_045512.2). Please see the description of *GRCh38_and_SARSCoV2* above for details.

#. **Index** column.

Put `10x single cell RNA-seq sample index set names`_ (e.g. SI-GA-A12) here.

#. *Chemistry* column.

Expand All @@ -85,22 +42,9 @@ Sample sheet
- Single Cell 3′
* - **fiveprime**
- Single Cell 5′
* - **SC3Pv1**
- Single Cell 3′ v1
* - **SC3Pv2**
- Single Cell 3′ v2
* - **SC3Pv3**
- Single Cell 3′ v3. You should set cellranger version input parameter to >= 3.0.2
* - **SC3Pv4**
- Single Cell 3' v4. **Notice:** This is GEM-X chemistry, and only works for Cell Ranger v8.0.0+
* - **SC5P-PE**
- Single Cell 5′ paired-end (both R1 and R2 are used for alignment)
* - **SC5P-PE-v3**
- Single Cell 5' paired-end v3 (both R1 and R2 are used for alignment). **Notice:** This is GEM-X chemistry, and only works for Cell Ranger v8.0.0+
* - **SC5P-R2**
- Single Cell 5′ R2-only (where only R2 is used for alignment)
* - **SC5P-R2-v3**
- Single Cell 5' R2-only v3 (where only R2 is used for alignment). **Notice:** This is GEM-X chemistry, and only works for Cell Rangrer v8.0.0+

#. *Flowcell* column.


#. *DataType* column.

Expand Down Expand Up @@ -140,38 +84,6 @@ For sc/snRNA-seq data, ``cellranger_workflow`` takes Illumina outputs as input a
- Output directory
- "gs://fc-e0000000-0000-0000-0000-000000000000/cellranger_output"
- Results are written under directory *output_directory* and will overwrite any existing files at this location.
* - run_mkfastq
- If you want to run ``cellranger mkfastq``
- true
- true
* - run_count
- If you want to run ``cellranger count``
- true
- true
* - delete_input_bcl_directory
- If delete BCL directories after demux. If false, you should delete this folder yourself so as to not incur storage charges
- false
- false
* - mkfastq_barcode_mismatches
- Number of mismatches allowed in matching barcode indices (bcl2fastq2 default is 1)
- 0
-
* - mkfastq_force_single_index
- If 10x-supplied i7/i5 paired indices are specified, but the flowcell was run with only one sample index, allow the demultiplex to proceed using the i7 half of the sample index pair
- false
- false
* - mkfastq_filter_single_index
- Only demultiplex samples identified by an i7-only sample index, ignoring dual-indexed samples. Dual-indexed samples will not be demultiplexed
- false
- false
* - mkfastq_use_bases_mask
- Override the read lengths as specified in *RunInfo.xml*
- "Y28n*,I8n*,N10,Y90n*"
-
* - mkfastq_delete_undetermined
- Delete undetermined FASTQ files generated by bcl2fastq2
- true
- false
* - force_cells
- Force pipeline to use this number of cells, bypassing the cell detection algorithm, mutually exclusive with expect_cells
- 6000
Expand All @@ -193,7 +105,7 @@ For sc/snRNA-seq data, ``cellranger_workflow`` takes Illumina outputs as input a
- false
- false
* - cellranger_version
- cellranger version, could be: 9.0.0, 8.0.1, 8.0.0, 7.2.0, 7.1.0, 7.0.1, 7.0.0, 6.1.2, 6.1.1, 6.0.2, 6.0.1, 6.0.0, 5.0.1, 5.0.0
- cellranger version, could be: 9.0.0, 8.0.1, 8.0.0, 7.2.0, 7.1.0, 7.0.1, 7.0.0
- "9.0.0"
- "9.0.0"
* - config_version
Expand Down
16 changes: 0 additions & 16 deletions docs/cellranger/sc_vdj.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,6 @@ Sample sheet
- Human GRCh38 V(D)J sequences, cellranger reference 7.0.0, annotation built from Ensembl *Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf*
* - **GRCm38_vdj_v7.0.0**
- Mouse GRCm38 V(D)J sequences, cellranger reference 7.0.0, annotation built from Ensembl *Mus_musculus.GRCm38.94.gtf*
* - **GRCh38_vdj_v5.0.0**
- Human GRCh38 V(D)J sequences, cellranger reference 5.0.0, annotation built from Ensembl *Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf*
* - **GRCm38_vdj_v5.0.0**
- Mouse GRCm38 V(D)J sequences, cellranger reference 5.0.0, annotation built from Ensembl *Mus_musculus.GRCm38.94.gtf*
* - **GRCh38_vdj_v4.0.0**
- Human GRCh38 V(D)J sequences, cellranger reference 4.0.0, annotation built from Ensembl *Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf*
* - **GRCm38_vdj_v4.0.0**
- Mouse GRCm38 V(D)J sequences, cellranger reference 4.0.0, annotation built from Ensembl *Mus_musculus.GRCm38.94.gtf*
* - **GRCh38_vdj_v3.1.0**
- Human GRCh38 V(D)J sequences, cellranger reference 3.1.0, annotation built from Ensembl *Homo_sapiens.GRCh38.94.chr_patch_hapl_scaff.gtf*
* - **GRCm38_vdj_v3.1.0**
- Mouse GRCm38 V(D)J sequences, cellranger reference 3.1.0, annotation built from Ensembl *Mus_musculus.GRCm38.94.gtf*
* - **GRCh38_vdj_v2.0.0** or **GRCh38_vdj**
- Human GRCh38 V(D)J sequences, cellranger reference 2.0.0, annotation built from Ensembl *Homo_sapiens.GRCh38.87.chr_patch_hapl_scaff.gtf* and *vdj_GRCh38_alts_ensembl_10x_genes-2.0.0.gtf*
* - **GRCm38_vdj_v2.2.0** or **GRCm38_vdj**
- Mouse GRCm38 V(D)J sequences, cellranger reference 2.2.0, annotation built from Ensembl *Mus_musculus.GRCm38.90.chr_patch_hapl_scaff.gtf*

#. **Index** column.

Expand Down
2 changes: 1 addition & 1 deletion docs/spaceranger.rst
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ For spatial data, ``spaceranger_workflow`` takes Illumina outputs and related im
- 50
-
* - spaceranger_version
- spaceranger version, could be: 3.1.2, 3.0.1, 3.0.0, 2.1.1, 2.0.1, 2.0.0, 1.3.1, 1.3.0
- spaceranger version, could be: 3.1.2, 3.0.1, 3.0.0
- "3.1.2"
- "3.1.2"
* - config_version
Expand Down
Loading