Built on 2023-11-06, v0.1.6, doc update

neurobioinfo · Nov 10, 2023 · f8e66e7 · f8e66e7
1 parent 268dc59
commit f8e66e7
Show file tree

Hide file tree

Showing 27 changed files with 352 additions and 225 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/docs/Step0.md b/docs/Step0.md
@@ -33,10 +33,13 @@ After running Step 0, the structure of the working directory should be:
 - The `logs/` directory records the events of each analytical step; <br />
 - The `parameters/` directory contains adjustable, step-specific text files which allow users to define the execution parameters for each analytical step. <br />
 
-Next, navigate to the `scrnabox_config.ini` file in `~/working_directory/job_info/configs` to define the path to the R library, the version of R, and the path to CellRanger:
+Next, navigate to the `scrnabox_config.ini` file in `~/working_directory/job_info/configs` to define the HPC account holder (**ACCOUNT**), the path to the environmental module (**MODULEUSE**), the path to CellRanger from the environmental module directory (**CELLRANGER**), CellRanger version (**CELLRANGER_VERSION**), R version (**R_VERSION**), and the path to the R library (**R_LIB_PATH**):
 
 ```
-MODULECELLRANGER=mugqic/cellranger/5.0.1
+ACCOUNT=account-name
+MODULEUSE=/path/to/environmental/module (e.g. /cvmfs/soft.mugqic/CentOS6/modulefiles)
+CELLRANGER=/path/to/cellranger/from/module/directory (e.g. mugqic/cellranger)
+CELLRANGER_VERSION=5.0.1
 R_VERSION=4.2.1
 R_LIB_PATH=/path/to/R/library
 ```

diff --git a/docs/Step1.md b/docs/Step1.md
@@ -51,8 +51,13 @@ The following parameters are adjustable for Step 1 of the **standard track** (`~
 |par_paired_end_seq|Yes| Whether or not paired-end sequencing was performed|
 |par_ref_dir_grch|NULL|Path to reference genome for FASTQ alignment. 10X Genomics reference genomes are available for download. For more information see the 10X Genomics [documentation](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_ct).|
 |par_r1_length|NULL|Minimum number of bases to retain for R1 sequence of gene expression|
-|par_include_introns|No|Whether or not to include intronic reads in the gene expression matrix|
+|par_r2_length|NULL|Minimum number of bases to retain for R2 sequence of gene expression|
 |par_mempercode|30|For clusters whose job managers do not support memory requests, it is possible to request memory in the form of cores. This option will scale up the number of threads requested via the __MRO_THREADS__ variable according to how much memory a stage requires when given to the ratio of memory on your nodes.|
+|par_include_introns|No|Whether or not to include intronic reads in the gene expression matrix|
+|par_no_target_umi_filter|No| Whether or not to tirn of CellRanger's target UMI filtering subpipeline|
+|par_expect_cells|NULL| Expected number of cells. By default, CellRanger's auto-estimate algorithm will be used|
+|par_force_cells|NULL| Force the CellRanger count ipeline to use N cells.|
+|par_no_bam|No| Whether or not to skip the bam file generation in the CellRanger pipeline.|
 
 The following parameters are adjustable for Step 1 of the **HTO track** (`~/working_directory/job_info/parameters/step1_par.txt`):
 
@@ -69,10 +74,15 @@ The following parameters are adjustable for Step 1 of the **HTO track** (`~/work
 |read|R2|Which RNA sequencing read contains the barcode sequence. This value Will be either R1 or R2.|
 |pattern|NULL|The pattern of the barcode identifiers|
 |sequence|NULL|The nucleotide sequence associated with the barcode identifier|
-|par_ref_dir_grch|NULL|Path to reference genome for FASTQ alignment. 10X Genomics reference genomes are available for download. For more information see their [documentation](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_ct).|
+|par_ref_dir_grch|NULL|Path to reference genome for FASTQ alignment. 10X Genomics reference genomes are available for download. For more information see the 10X Genomics [documentation](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_ct).|
 |par_r1_length|NULL|Minimum number of bases to retain for R1 sequence of gene expression|
-|par_include_introns|No|Whether or not to include intronic reads in the gene expression matrix|
+|par_r2_length|NULL|Minimum number of bases to retain for R2 sequence of gene expression|
 |par_mempercode|30|For clusters whose job managers do not support memory requests, it is possible to request memory in the form of cores. This option will scale up the number of threads requested via the __MRO_THREADS__ variable according to how much memory a stage requires when given to the ratio of memory on your nodes.|
+|par_include_introns|No|Whether or not to include intronic reads in the gene expression matrix|
+|par_no_target_umi_filter|No| Whether or not to tirn of CellRanger's target UMI filtering subpipeline|
+|par_expect_cells|NULL| Expected number of cells. By default, CellRanger's auto-estimate algorithm will be used|
+|par_force_cells|NULL| Force the CellRanger count ipeline to use N cells.|
+|par_no_bam|No| Whether or not to skip the bam file generation in the CellRanger pipeline.|
 
 Given that CellRanger runs a user interface, it is recommended to run Step 1 in a **'screen'** which will allow the the task to keep running if the connection is broken. To run Step 1, use the following command:
 ```

diff --git a/docs/Step2.md b/docs/Step2.md
@@ -12,11 +12,10 @@ The following parameters are adjustable for Step 2 (`~/working_directory/job_inf
 
 |Parameter|Default|Description|
 |:--|:--|:--|
-|par_save_RNA| No| Whether or not to export an RNA expression matrix|
-|par_save_metadata| No|Whether or not to export a metadata dataframe|
+|par_save_RNA| Yes| Whether or not to export an RNA expression matrix|
+|par_save_metadata| Yes|Whether or not to export a metadata dataframe|
 |par_ambient_RNA| Yes|Whether or not to correct the feature-barcode expression matrices for ambient RNA contamination|
-|par_count_matrices| NULL|If users skipped Step 1, they may provide the path to a directory that contains existing feature-barcode expression matrices to initiate the pipeline at Step 2 |
-|par_min.cells_L| 0|Only retain genes expressed in a minimum number of cells|
+|par_min.cells_L| 3|Only retain genes expressed in a minimum number of cells|
 |par_normalization.method|LogNormalize|Method to use for normalization|
 |par_scale.factor|10000|Scale factor for scaling the data|
 |par_selection.method|vst|Method for choosing the top variable features|

diff --git a/docs/Step3.md b/docs/Step3.md
@@ -14,17 +14,17 @@ The following parameters are adjustable for Step 3 (`~/working_directory/job_inf
 
 |Parameter|Default|Description|
 |:--|:--|:--|
-|par_save_RNA| No| Whether or not to export an RNA expression matrix|
-|par_save_metadata| No|Whether or not to export a metadata dataframe|
+|par_save_RNA| Yes| Whether or not to export an RNA expression matrix|
+|par_save_metadata| Yes|Whether or not to export a metadata dataframe|
 |par_seurat_object| NULL |If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 3|
-|par_nFeature_RNA_L|NULL |Only retain cells expressing a minimum number of unique RNA transcripts|
-|par_nFeature_RNA_U|NULL |Only retain cells expressing a maximum number of unique RNA transcripts|
-|par_nCount_RNA_L|NULL |Only retain cells with a minimum number of total RNA transcripts|
-|par_nCount_RNA_U|NULL |Only retain cells with a maximum number of total RNA transcripts|
-|par_mitochondria_percent_L|NULL | Only retain cells with a minimum percentage of mitochondrial-encoded genes|
-|par_mitochondria_percent_U|NULL |Only retain cells with a maximum percentage of mitochondrial-encoded genes|
-|par_ribosomal_percent_L|NULL |Only retain cells with a minimum percentage of ribosome genes|
-|par_ribosomal_percent_U|NULL |Only retain cells with a maximum percentage of ribosome genes|
+|par_nFeature_RNA_L|300 |Only retain cells expressing a minimum number of unique RNA transcripts|
+|par_nFeature_RNA_U|10000 |Only retain cells expressing a maximum number of unique RNA transcripts|
+|par_nCount_RNA_L|300 |Only retain cells with a minimum number of total RNA transcripts|
+|par_nCount_RNA_U|20000 |Only retain cells with a maximum number of total RNA transcripts|
+|par_mitochondria_percent_L|0 | Only retain cells with a minimum percentage of mitochondrial-encoded genes|
+|par_mitochondria_percent_U|20 |Only retain cells with a maximum percentage of mitochondrial-encoded genes|
+|par_ribosomal_percent_L|0 |Only retain cells with a minimum percentage of ribosome genes|
+|par_ribosomal_percent_U|100 |Only retain cells with a maximum percentage of ribosome genes|
 |par_remove_mitochondrial_genes|No| Whether or not to remove mitochondrial genes|
 |par_remove_ribosomal_genes|No| Whether or not to remove ribosomal genes|
 |par_remove_genes|NULL|If users want to remove specific genes from their data, they may define a list of gene identifiers|

diff --git a/docs/Step4.md b/docs/Step4.md
@@ -6,13 +6,13 @@ The following parameters are adjustable for Step 4 of the standard track (`~/wor
 
 |Parameter|Default|Description|
 |:--|:--|:--|
-|par_save_RNA| No| Whether or not to export an RNA expression matrix|
-|par_save_metadata| No|Whether or not to export a metadata dataframe|
+|par_save_RNA| Yes| Whether or not to export an RNA expression matrix|
+|par_save_metadata| Yes|Whether or not to export a metadata dataframe|
 |par_seurat_object| NULL |If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 4|
-|par_RunUMAP_dims|10| Number of dimensions to use as input features for uniform manifold approximation and projection (UMAP)|
-|par_RunUMAP_n.neighbors|65|Number of neighboring points used in local approximations of manifold structure|
+|par_RunUMAP_dims|25| Number of dimensions to use as input features for uniform manifold approximation and projection (UMAP)|
+|par_RunUMAP_n.neighbors|45|Number of neighboring points used in local approximations of manifold structure|
 |par_dropDN| Yes| Whether or not to remove predicted doublets from downstream analyses|
-|par_PCs|20| The number of statistically significant principal components. Can be informed by elbow plot produced in Step 3|
+|par_PCs|25| The number of statistically significant principal components. Can be informed by elbow plot produced in Step 3|
 |par_pN|0.25| The number of artificial doublets to generate. DoubletFinderr is largely invariant to this parameter. We suggest keeping 0.25|
 |par_sct|FALSE|Logical representing whether SCTransform was used during original Seurat object pre-processing|
 |par_sample_names|NULL| A list of sample names for each sample in the experiement, corresponding to the expected doublet rates listed in the parameter below. Sample names should be the same as those used to produce the `samples_info` folder during the setup procedures.|

diff --git a/docs/Step4HTO.md b/docs/Step4HTO.md
@@ -7,8 +7,8 @@ The following parameters are adjustable for Step 4 (`~/working_directory/job_inf
 
 |Parameter|Default|Description|
 |:--|:--|:--|
-|par_save_RNA| No| Whether or not to export an RNA expression matrix|
-|par_save_metadata| No|Whether or not to export a metadata dataframe|
+|par_save_RNA| Yes| Whether or not to export an RNA expression matrix|
+|par_save_metadata| Yes|Whether or not to export a metadata dataframe|
 |par_seurat_object| NULL |If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 4|
 |par_normalization.method|CLR|Method for normalizing the HTO assay|
 |par_scale.factor| 1000|Scale factor for scaling the HTO assay|

diff --git a/docs/Step5.md b/docs/Step5.md
@@ -8,8 +8,8 @@ The following parameters are adjustable for Step 5 (`~/working_directory/job_inf
 
 |Parameter|Default|Description|
 |:--|:--|:--|
-|par_save_RNA| No| Whether or not to export an RNA expression matrix|
-|par_save_metadata| No|Whether or not to export a metadata dataframe|
+|par_save_RNA| Yes| Whether or not to export an RNA expression matrix|
+|par_save_metadata| Yes|Whether or not to export a metadata dataframe|
 |par_seurat_object| NULL |If users already have a Seurat object(s), they may provide the path to a directory that contains an existing Seurat object(s) to initiate the pipeline at Step 5|
 |par_one_seurat| No| Whether or not the experiment comprises of only one sequencing run. If this parameter is set to "Yes", set par_integrate_seurat and par_merge_seurat to "No".|
 |par_integrate_seurat| Yes| Whether or not to integrate the samples. If "Yes", par_merge_seurat must be "No". |
@@ -21,10 +21,11 @@ The following parameters are adjustable for Step 5 (`~/working_directory/job_inf
 |par_nfeatures|2500|Number of features to select as top variable features|
 |par_FindIntegrationAnchors_dim|25|Which dimensions to use from the canonical correlation analysis (CCA) to specify the neighbor search space|
 |par_RunPCA_npcs|30| Total Number of principal components to compute and store for principal component analysis (PCA)|
-|par_RunUMAP_dims|10| Number of dimensions to use as input features for uniform manifold approximation and projection (UMAP)|
-|par_RunUMAP_n.neighbors|65|Number of neighboring points used in local approximations of manifold structure|
+|par_RunUMAP_dims|25| Number of dimensions to use as input features for uniform manifold approximation and projection (UMAP)|
+|par_RunUMAP_n.neighbors|45|Number of neighboring points used in local approximations of manifold structure|
 |par_compute_jackstraw |No|Whether or not to perform JackStraw computation. This computation takes a long time.|
 
+
  - - - -
 
 To run Step 5, use the following command:

diff --git a/docs/Step6.md b/docs/Step6.md
@@ -5,14 +5,15 @@ The following parameters are adjustable for Step 6 (`~/working_directory/job_inf
 
 |Parameter|Default|Description|
 |:--|:--|:--|
-|par_save_RNA| No| Whether or not to export an RNA expression matrix|
-|par_save_metadata| No|Whether or not to export a metadata dataframe|
+|par_save_RNA| Yes| Whether or not to export an RNA expression matrix|
+|par_save_metadata| Yes|Whether or not to export a metadata dataframe|
 |par_seurat_object| NULL |If users already have a Seurat object, they may provide the path to the Seurat object to initiate the pipeline at Step 6|
 |par_skip_integration|No|Whether or not the user skipped integration in Step 5| 
-|par_FindNeighbors_dims|30| Number of dimensions from linear dimensional reduction used as input to identify neighbours. Can be informed by the elbow and Jackstraw plots produced in Step 5|
-|par_FindNeighbors_k.param|60|Defines k for the k-nearest neighbor algorithm|
+|par_FindNeighbors_dims|25| Number of dimensions from linear dimensional reduction used as input to identify neighbours. Can be informed by the elbow and Jackstraw plots produced in Step 5|
+|par_RunUMAP_dims|25| Number of dimensions to use as input features for uniform manifold approximation and projection (UMAP)|
+|par_FindNeighbors_k.param|45|Defines k for the k-nearest neighbor algorithm|
 |par_FindNeighbors_prune.SNN|1/15|Sets the cutoff for acceptable Jaccard index when computing the neighborhood overlap for the shared nearest-neighbour (SNN) construction
-|par_FindClusters_resolution|0, 0.05, 0.2, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0|Value of the clustering resolution parameter. You may provide multiple resolution values|
+|par_FindClusters_resolution|0, 0.05, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 2.0|Value of the clustering resolution parameter. You may provide multiple resolution values|
 |par_compute_ARI|Yes| Whether or not you want to compute the Adjusted Rand Index (ARI) between clusters at a given clustering resolution|
 |par_RI_reps|25|Number of iterations for clustering the data at a given resolution in order to calculate the ARI|
 

diff --git a/docs/Step7.md b/docs/Step7.md
@@ -18,12 +18,12 @@ The following parameters are adjustable for Step 7 (`~/working_directory/job_inf
 
 |Annotation tool|Parameter|Default|Description|
 |:--|:--|:--|:--|
-|**General**|par_save_RNA| No| Whether or not to export an RNA expression matrix|
-|**General**|par_save_metadata| No|Whether or not to export a metadata dataframe|
+|**General**|par_save_RNA| Yes| Whether or not to export an RNA expression matrix|
+|**General**|par_save_metadata| Yes|Whether or not to export a metadata dataframe|
 |**General**|par_seurat_object| NULL |If users already have a Seurat object, they may provide the path to the Seurat object to initiate the pipeline at Step 7|
-|**General**|par_level_cluster| integrated_snn_res.0.7| The cluster resolution that you want to annotate. If you skipped integration in Step 5, use par_level_cluster='RNA_snn_res.0.7', if you want to proceed with a clustering resolution of 0.7.|
+|**General**|par_level_cluster| integrated_snn_res.0.75| The cluster resolution that you want to annotate. If you skipped integration in Step 5, use par_level_cluster='RNA_snn_res.0.7', if you want to proceed with a clustering resolution of 0.7.|
 |**Tool 1**|par_run_find_marker|Yes|Whether or not to find marker genes for each cluster|
-|**Tool 1**|par_run_enrichR|Yes|Whether or not to run gene set enrichment analysis (GSEA) on the marker genes for each cluster using the EnrichR tools. Note that the HPC must have access to the internet to run GSEA.|
+|**Tool 1**|par_run_enrichR|No|Whether or not to run gene set enrichment analysis (GSEA) on the marker genes for each cluster using the EnrichR tools. Note that the HPC must have access to the internet to run GSEA.|
 |**Tool 1**|par_top_sel|5|Number of top markers to identify based on avg_log2FC|
 |**Tool 1**|par_db|Descartes_Cell_Types_and_Tissue_2021,<br /> CellMarker_Augmented_2021,<br />Azimuth_Cell_Types_2021|Character vector of EnrichR databases that define cell types. The top marker genes for each cluster will be tested for enrichment across these databases.|
 |**Tool 2**|par_run_module_score|Yes|Whether or not to compute module score for aggregated expression |
@@ -34,10 +34,10 @@ The following parameters are adjustable for Step 7 (`~/working_directory/job_inf
 |**Tool 3**|par_reference|NULL| Path defining the location of the reference Seurat object|
 |**Tool 3**|par_reference_name|Reference| An arbitrary name for the reference object. This will be used to name the metadata slot.|
 |**Tool 3**|par_level_celltype|NULL|The name of the metadata column in the reference Seurat object that defines cell types|
-|**Tool 3**|par_FindTransferAnchors_dim|10| Number of dimensions from linear dimensional reduction used to find transfer anchors between the reference and query Seurat objects|
-|**Tool 3**|par_futureglobalsmaxSize|50000 * 1024^2|This will increase your RAM usage so set this number mindfully|
-|**Annotate**|par_annotate_resolution|NULL| Which clustering resolution you want to annotate|
-|**Annotate**|par_name_metadata|clustering_label_1| The name of the metadata slot that will contain the annotations|
+|**Tool 3**|par_FindTransferAnchors_dim|50| Number of dimensions from linear dimensional reduction used to find transfer anchors between the reference and query Seurat objects|
+|**Tool 3**|par_futureglobalsmaxSize|60000 * 1024^2|This will increase your RAM usage so set this number mindfully|
+|**Annotate**|par_annotate_resolution|integrated_snn_res.0.75| Which clustering resolution you want to annotate|
+|**Annotate**|par_name_metadata|Celltypes1| The name of the metadata slot that will contain the annotations|
 |**Annotate**|par_annotate_labels|NULL| A list of cluster labels. There must as many labels as clusters at the defined clustering resolution. Please refrain from using "_" when annotating.|
 
  - - - -

diff --git a/docs/Step8.md b/docs/Step8.md
@@ -12,8 +12,8 @@ The following parameters are adjustable for Step 8:
 
 |DGE method|Parameter|Default|Description|
 |:--|:--|:--|:--|
-|**General**|par_save_RNA| No| Whether or not to export an RNA expression matrix|
-|**General**|par_save_metadata| No|Whether or not to export a metadata dataframe|
+|**General**|par_save_RNA| Yes| Whether or not to export an RNA expression matrix|
+|**General**|par_save_metadata| Yes|Whether or not to export a metadata dataframe|
 |**General**|par_seurat_object| NULL |If users already have a Seurat object, they may provide the path to the Seurat object to initiate the pipeline at Step 7|
 |**Add metadata**|par_merge_meta|orig.ident|The column from the Seurat metdata that will be used to merge the new metadata. This column must also exist in the submitted csv file contaning new metadata.|
 |**Add metadata**|par_metadata|NULL|csv file containing metadata to be added to the Seurat object|