From 38d2a69928ba42600d488c39a556b7187592c55e Mon Sep 17 00:00:00 2001
From: Stephanie Spielman <stephanie.spielman@gmail.com>
Date: Thu, 16 Jan 2025 11:59:29 -0500
Subject: [PATCH 1/9] add draft for seurat section

---
 docs/troubleshooting-faq/faq.md | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/docs/troubleshooting-faq/faq.md b/docs/troubleshooting-faq/faq.md
index d339e1870..b6cd3e31e 100644
--- a/docs/troubleshooting-faq/faq.md
+++ b/docs/troubleshooting-faq/faq.md
@@ -1,8 +1,8 @@
 # Frequently asked questions
 
-### Why didn't the sample/project I specified when running the [data download script](../getting-started/accessing-resources/getting-access-to-data.md#using-the-download-data-script) download?
+### Why didn't the sample/project I specified when running the data download script download?
 
-First, we recommend using the `--dryrun` flag when running the `download-data.py` script to check which files _would_ be downloaded.
+First, we recommend using the `--dryrun` flag when running the [data download script](../getting-started/accessing-resources/getting-access-to-data.md#using-the-download-data-script) to check which files _would_ be downloaded.
 This will confirm that there is nothing wrong with your internet connection and that you are properly [logged into your AWS profile](../technical-setup/environment-setup/configure-aws-cli.md#logging-in-to-a-new-session).
 
 If running the script with `--dryrun` states that _only_ the `DATA_USAGE.md` file is being downloaded, this means the data files you are attempting to download do not exist.
@@ -117,3 +117,19 @@ However, there may be circumstances when you want to use results from a module w
 In such cases, you will need to run the module yourself to generate the results.
 Instructions for [running the module](../contributing-to-analyses/analysis-modules/running-a-module.md), including its software and compute requirements, should be available in the module's main `README.md` file.
 After running the module, results will generally be stored in `analysis/{module name}/results`, and the module's documentation should describe the contents of result files.
+
+
+### What if I want to use Seurat?
+
+While [data downloads](../getting-started/accessing-resources/getting-access-to-data.md) are only available in `SingleCellExperiment` and `AnnData` format, `Seurat` versions of all objects (in [v5 assay format](https://satijalab.org/seurat/articles/seurat5_essential_commands)) are also available for use.
+
+These files are part of the OpenScPCA results, associated with the module [`seurat-conversion`](https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/seurat-conversion) which we wrote to convert the processed `SingleCellExperiment` objects to `Seurat` format.
+For more information on obtaining result files, please refer to the documentation for [the `download-results.py` script](../getting-started/accessing-resources/getting-access-to-data.md#accessing-scpca-module-results).
+
+When working with these `Seurat` objects, please bear in mind the following:
+
+* They were _not_ processed with a `Seurat` pipeline.
+They were processed using the same pipeline as all OpenScPCA objects (e.g., with `Bioconductor`), and then converted to a `Seurat` format
+    * Notably, they do contain the raw data counts, allowing you to perform normalization, dimension reduction, etc. with `Seurat` directly if you so choose
+* To be more consistent with `Seurat` analysis pipelines, gene names in these objects use gene symbols rather than Ensembl ids
+

From e08e68335516425155e218b7077c37ca30c99b05 Mon Sep 17 00:00:00 2001
From: Stephanie Spielman <stephanie.spielman@gmail.com>
Date: Thu, 16 Jan 2025 12:06:50 -0500
Subject: [PATCH 2/9] Add draft of gene conversion section

---
 docs/troubleshooting-faq/faq.md | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/docs/troubleshooting-faq/faq.md b/docs/troubleshooting-faq/faq.md
index b6cd3e31e..9f6601d77 100644
--- a/docs/troubleshooting-faq/faq.md
+++ b/docs/troubleshooting-faq/faq.md
@@ -130,6 +130,19 @@ When working with these `Seurat` objects, please bear in mind the following:
 
 * They were _not_ processed with a `Seurat` pipeline.
 They were processed using the same pipeline as all OpenScPCA objects (e.g., with `Bioconductor`), and then converted to a `Seurat` format
-    * Notably, they do contain the raw data counts, allowing you to perform normalization, dimension reduction, etc. with `Seurat` directly if you so choose
+  * Notably, they do contain the raw data counts, allowing you to perform normalization, dimension reduction, etc. with `Seurat` directly if you so choose
 * To be more consistent with `Seurat` analysis pipelines, gene names in these objects use gene symbols rather than Ensembl ids
 
+
+### The ScPCA data objects contain Ensembl ids, but I need gene symbols for my analysis. How should I perform this conversion?
+
+In an effort to keep this consistent across the OpenScPCA project, we provide functions to convert Ensembl ids to gene symbols in an R package we maintain called [`rOpenScPCA`](https://github.com/AlexsLemonade/rOpenScPCA/).
+
+This package has two particular functions to support this task:
+
+* `rOpenScPCA::sce_to_symbols()`
+  * This function converts row names in a `SingleCellExperiment` object from Ensembl ids to gene symbols
+* `rOpenScPCA::ensembl_to_symbol()`
+  * This function converts a vector of Ensembl ids to a vector of gene symbols
+
+Please refer to these functions' help menus (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on how to use them.

From 7d8c80446af02593becf804ac7eab7acc137bd05 Mon Sep 17 00:00:00 2001
From: Stephanie Spielman <stephanie.spielman@gmail.com>
Date: Thu, 16 Jan 2025 12:15:06 -0500
Subject: [PATCH 3/9] fix bad spacing I noticed

---
 analyses/hello-clusters/README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/analyses/hello-clusters/README.md b/analyses/hello-clusters/README.md
index b095a73c8..2f29f007f 100644
--- a/analyses/hello-clusters/README.md
+++ b/analyses/hello-clusters/README.md
@@ -56,10 +56,10 @@ renv::update("rOpenScPCA")
 ## Example notebooks
 
 1. The `01_perform-evaluate-clustering.Rmd` notebook shows examples of:
-  - Performing clustering with `rOpenScPCA::calculate_clusters()`
-  - Evaluating clustering with `rOpenScPCA::calculate_silhouette()`, `rOpenScPCA::calculate_purity()`, and `rOpenScPCA::calculate_stability()`
+    - Performing clustering with `rOpenScPCA::calculate_clusters()`
+    - Evaluating clustering with `rOpenScPCA::calculate_silhouette()`, `rOpenScPCA::calculate_purity()`, and `rOpenScPCA::calculate_stability()`
 It also contains explanations for how to interpret cluster quality metrics.
 
 2. The `02_compare-clustering-parameters.Rmd` notebook shows examples of:
-  - Performing clustering across a set of parameterizations with `rOpenScPCA::sweep_clusters()`
-  - Comparing and visualizing multiple sets of clustering results
+    - Performing clustering across a set of parameterizations with `rOpenScPCA::sweep_clusters()`
+    - Comparing and visualizing multiple sets of clustering results

From 3f169840ede148c974acb130157f9a06b2624189 Mon Sep 17 00:00:00 2001
From: Stephanie Spielman <stephanie.spielman@gmail.com>
Date: Thu, 16 Jan 2025 12:15:32 -0500
Subject: [PATCH 4/9] add draft of clusters section

---
 docs/troubleshooting-faq/faq.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/docs/troubleshooting-faq/faq.md b/docs/troubleshooting-faq/faq.md
index 9f6601d77..c41f49d70 100644
--- a/docs/troubleshooting-faq/faq.md
+++ b/docs/troubleshooting-faq/faq.md
@@ -146,3 +146,17 @@ This package has two particular functions to support this task:
   * This function converts a vector of Ensembl ids to a vector of gene symbols
 
 Please refer to these functions' help menus (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on how to use them.
+
+### I noticed there are cluster assignments in the processed data files. Should I use those or re-cluster the data?
+
+All ScPCA data objects contain cluster assignments which were [calculated using an automated pipeline](https://scpca.readthedocs.io/en/stable/processing_information.html#processed-gene-expression-data).
+Because the clustering parameters used in this automated pipeline were not tailored to any given dataset, we do not recommend relying on these clusters for downstream analysis.
+Instead, we strongly recommend re-clustering the data _and_ evaluating your cluster assignments before using them.
+
+To support clustering analysis and evaluation, we provide several functions in an R package we maintain called [`rOpenScPCA`](https://github.com/AlexsLemonade/rOpenScPCA/) to accomplish the following tasks:
+
+* Perform graph-based clustering
+* Evaluate clustering results with quality control metrics
+* Calculate several sets of clustering results across parameter space
+
+We also provide an OpenScPCA analysis module [`hello-clusters`](https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/hello-clusters) with example notebooks demonstrating how to use clustering functionality in `rOpenScPCA`.

From 96443ed5554a0f191539d48be3cd2a9937e488d9 Mon Sep 17 00:00:00 2001
From: Stephanie Spielman <stephanie.spielman@gmail.com>
Date: Thu, 16 Jan 2025 12:18:51 -0500
Subject: [PATCH 5/9] a few text cleanups

---
 docs/troubleshooting-faq/faq.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/docs/troubleshooting-faq/faq.md b/docs/troubleshooting-faq/faq.md
index c41f49d70..f1ef90761 100644
--- a/docs/troubleshooting-faq/faq.md
+++ b/docs/troubleshooting-faq/faq.md
@@ -129,7 +129,7 @@ For more information on obtaining result files, please refer to the documentatio
 When working with these `Seurat` objects, please bear in mind the following:
 
 * They were _not_ processed with a `Seurat` pipeline.
-They were processed using the same pipeline as all OpenScPCA objects (e.g., with `Bioconductor`), and then converted to a `Seurat` format
+They were processed using the same pipeline as all OpenScPCA objects were (e.g., with `Bioconductor`), and then converted to `Seurat` format
   * Notably, they do contain the raw data counts, allowing you to perform normalization, dimension reduction, etc. with `Seurat` directly if you so choose
 * To be more consistent with `Seurat` analysis pipelines, gene names in these objects use gene symbols rather than Ensembl ids
 
@@ -137,6 +137,7 @@ They were processed using the same pipeline as all OpenScPCA objects (e.g., with
 ### The ScPCA data objects contain Ensembl ids, but I need gene symbols for my analysis. How should I perform this conversion?
 
 In an effort to keep this consistent across the OpenScPCA project, we provide functions to convert Ensembl ids to gene symbols in an R package we maintain called [`rOpenScPCA`](https://github.com/AlexsLemonade/rOpenScPCA/).
+Installation instructions are provided in the `rOpenScPCA` GitHub repository.
 
 This package has two particular functions to support this task:
 
@@ -145,7 +146,7 @@ This package has two particular functions to support this task:
 * `rOpenScPCA::ensembl_to_symbol()`
   * This function converts a vector of Ensembl ids to a vector of gene symbols
 
-Please refer to these functions' help menus (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on how to use them.
+Please refer to these functions' help menus (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on their use.
 
 ### I noticed there are cluster assignments in the processed data files. Should I use those or re-cluster the data?
 
@@ -156,7 +157,8 @@ Instead, we strongly recommend re-clustering the data _and_ evaluating your clus
 To support clustering analysis and evaluation, we provide several functions in an R package we maintain called [`rOpenScPCA`](https://github.com/AlexsLemonade/rOpenScPCA/) to accomplish the following tasks:
 
 * Perform graph-based clustering
-* Evaluate clustering results with quality control metrics
-* Calculate several sets of clustering results across parameter space
+* Evaluate clustering results with several quality control metrics
+* Calculate different sets of clustering results across parameter space in order to identify an optimal clustering scheme
 
 We also provide an OpenScPCA analysis module [`hello-clusters`](https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/hello-clusters) with example notebooks demonstrating how to use clustering functionality in `rOpenScPCA`.
+This module also provides instructions on how to install `rOpenScPCA`.

From e3f7cfcc5a5a0676a6048a733c96f51f82275086 Mon Sep 17 00:00:00 2001
From: Stephanie Spielman <stephanie.spielman@gmail.com>
Date: Thu, 16 Jan 2025 12:23:53 -0500
Subject: [PATCH 6/9] restore script name

---
 docs/troubleshooting-faq/faq.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/troubleshooting-faq/faq.md b/docs/troubleshooting-faq/faq.md
index f1ef90761..5e4a721cd 100644
--- a/docs/troubleshooting-faq/faq.md
+++ b/docs/troubleshooting-faq/faq.md
@@ -2,7 +2,7 @@
 
 ### Why didn't the sample/project I specified when running the data download script download?
 
-First, we recommend using the `--dryrun` flag when running the [data download script](../getting-started/accessing-resources/getting-access-to-data.md#using-the-download-data-script) to check which files _would_ be downloaded.
+First, we recommend using the `--dryrun` flag when running the [`download-data.py` script](../getting-started/accessing-resources/getting-access-to-data.md#using-the-download-data-script) to check which files _would_ be downloaded.
 This will confirm that there is nothing wrong with your internet connection and that you are properly [logged into your AWS profile](../technical-setup/environment-setup/configure-aws-cli.md#logging-in-to-a-new-session).
 
 If running the script with `--dryrun` states that _only_ the `DATA_USAGE.md` file is being downloaded, this means the data files you are attempting to download do not exist.

From f12c243b6857a73e2c4e3fe9422793f0587baaa8 Mon Sep 17 00:00:00 2001
From: Stephanie Spielman <stephanie.spielman@gmail.com>
Date: Thu, 16 Jan 2025 13:12:20 -0500
Subject: [PATCH 7/9] Apply suggestions from code review

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
---
 docs/troubleshooting-faq/faq.md | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/docs/troubleshooting-faq/faq.md b/docs/troubleshooting-faq/faq.md
index 5e4a721cd..837c6ec02 100644
--- a/docs/troubleshooting-faq/faq.md
+++ b/docs/troubleshooting-faq/faq.md
@@ -121,17 +121,18 @@ After running the module, results will generally be stored in `analysis/{module
 
 ### What if I want to use Seurat?
 
-While [data downloads](../getting-started/accessing-resources/getting-access-to-data.md) are only available in `SingleCellExperiment` and `AnnData` format, `Seurat` versions of all objects (in [v5 assay format](https://satijalab.org/seurat/articles/seurat5_essential_commands)) are also available for use.
+While [data downloads](../getting-started/accessing-resources/getting-access-to-data.md) are only available in `SingleCellExperiment` and `AnnData` format, `Seurat` versions of processed objects (in [v5 assay format](https://satijalab.org/seurat/articles/seurat5_essential_commands)) are also available for use.
 
 These files are part of the OpenScPCA results, associated with the module [`seurat-conversion`](https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/seurat-conversion) which we wrote to convert the processed `SingleCellExperiment` objects to `Seurat` format.
 For more information on obtaining result files, please refer to the documentation for [the `download-results.py` script](../getting-started/accessing-resources/getting-access-to-data.md#accessing-scpca-module-results).
 
 When working with these `Seurat` objects, please bear in mind the following:
 
-* They were _not_ processed with a `Seurat` pipeline.
-They were processed using the same pipeline as all OpenScPCA objects were (e.g., with `Bioconductor`), and then converted to `Seurat` format
-  * Notably, they do contain the raw data counts, allowing you to perform normalization, dimension reduction, etc. with `Seurat` directly if you so choose
-* To be more consistent with `Seurat` analysis pipelines, gene names in these objects use gene symbols rather than Ensembl ids
+* These `Seurat` objects include the same content as the `SingleCellExperiment` objects that they are derived from.
+This includes raw and normalized counts, annotations of highly variable genes, PCA and UMAP transformations, as well as cell and feature metadata.  
+  * Note that all calculations were performed using `Bioconductor` packages, so values will differ from the results obtained using `Seurat` functions from the same raw data.
+  * If `Seurat`-derived values are required, processing steps may need to be repeated.
+* To be more consistent with `Seurat` analysis pipelines, these objects use gene symbols rather than Ensembl ids as the row names and primary feature id.
 
 
 ### The ScPCA data objects contain Ensembl ids, but I need gene symbols for my analysis. How should I perform this conversion?
@@ -141,12 +142,12 @@ Installation instructions are provided in the `rOpenScPCA` GitHub repository.
 
 This package has two particular functions to support this task:
 
-* `rOpenScPCA::sce_to_symbols()`
-  * This function converts row names in a `SingleCellExperiment` object from Ensembl ids to gene symbols
 * `rOpenScPCA::ensembl_to_symbol()`
   * This function converts a vector of Ensembl ids to a vector of gene symbols
+* `rOpenScPCA::sce_to_symbols()`
+  * This function converts row names in a `SingleCellExperiment` object from Ensembl ids to gene symbols
 
-Please refer to these functions' help menus (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on their use.
+Please refer to these functions' help pages (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on their use, including options for handling duplicate gene symbols.
 
 ### I noticed there are cluster assignments in the processed data files. Should I use those or re-cluster the data?
 
@@ -160,5 +161,5 @@ To support clustering analysis and evaluation, we provide several functions in a
 * Evaluate clustering results with several quality control metrics
 * Calculate different sets of clustering results across parameter space in order to identify an optimal clustering scheme
 
-We also provide an OpenScPCA analysis module [`hello-clusters`](https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/hello-clusters) with example notebooks demonstrating how to use clustering functionality in `rOpenScPCA`.
+We also provide an OpenScPCA analysis module [`hello-clusters`](https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/hello-clusters) with example notebooks demonstrating how to use the clustering functionality in `rOpenScPCA`.
 This module also provides instructions on how to install `rOpenScPCA`.

From 0ff6ce7aaaa6b3b0a6d7587e75bd32ba12c42d9a Mon Sep 17 00:00:00 2001
From: Stephanie Spielman <stephanie.spielman@gmail.com>
Date: Thu, 16 Jan 2025 13:54:17 -0500
Subject: [PATCH 8/9] respond to reviews and try out one new phrasing bit

---
 docs/troubleshooting-faq/faq.md | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/docs/troubleshooting-faq/faq.md b/docs/troubleshooting-faq/faq.md
index 837c6ec02..eb6331800 100644
--- a/docs/troubleshooting-faq/faq.md
+++ b/docs/troubleshooting-faq/faq.md
@@ -20,7 +20,7 @@ Data files in each release are organized on S3 as:
 {Release}
     ├── {Project ID}
     │   └── {Sample ID}
-    │       └── {Library files}
+    │       └── {Library files}
     ├── bulk_metadata.tsv (if applicable)
     ├── bulk_quant.tsv (if applicable)
     └── single_cell_metadata.tsv
@@ -129,17 +129,16 @@ For more information on obtaining result files, please refer to the documentatio
 When working with these `Seurat` objects, please bear in mind the following:
 
 * These `Seurat` objects include the same content as the `SingleCellExperiment` objects that they are derived from.
-This includes raw and normalized counts, annotations of highly variable genes, PCA and UMAP transformations, as well as cell and feature metadata.  
+This includes raw and normalized counts, annotations of highly variable genes, PCA and UMAP transformations, as well as cell and feature metadata.
   * Note that all calculations were performed using `Bioconductor` packages, so values will differ from the results obtained using `Seurat` functions from the same raw data.
-  * If `Seurat`-derived values are required, processing steps may need to be repeated.
+  * If your analysis requires fields created from `Seurat` processing pipelines, you will need to repeat those processing steps.
 * To be more consistent with `Seurat` analysis pipelines, these objects use gene symbols rather than Ensembl ids as the row names and primary feature id.
 
 
 ### The ScPCA data objects contain Ensembl ids, but I need gene symbols for my analysis. How should I perform this conversion?
 
-In an effort to keep this consistent across the OpenScPCA project, we provide functions to convert Ensembl ids to gene symbols in an R package we maintain called [`rOpenScPCA`](https://github.com/AlexsLemonade/rOpenScPCA/).
-Installation instructions are provided in the `rOpenScPCA` GitHub repository.
-
+In an effort to keep this consistent across the OpenScPCA project, we provide functions to convert Ensembl ids to gene symbols in an R package we maintain called `rOpenScPCA`.
+Installation instructions are provided in the [`rOpenScPCA` GitHub repository](https://github.com/AlexsLemonade/rOpenScPCA/?tab=readme-ov-file#installation).
 This package has two particular functions to support this task:
 
 * `rOpenScPCA::ensembl_to_symbol()`
@@ -147,7 +146,7 @@ This package has two particular functions to support this task:
 * `rOpenScPCA::sce_to_symbols()`
   * This function converts row names in a `SingleCellExperiment` object from Ensembl ids to gene symbols
 
-Please refer to these functions' help pages (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on their use, including options for handling duplicate gene symbols.
+Please refer to these functions' help pages (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on their use, including options for handling duplicate and/or missing gene symbols.
 
 ### I noticed there are cluster assignments in the processed data files. Should I use those or re-cluster the data?
 

From cdee8a46354223ade5364319e69a2e1eafff8dd8 Mon Sep 17 00:00:00 2001
From: Stephanie Spielman <stephanie.spielman@gmail.com>
Date: Fri, 17 Jan 2025 08:46:30 -0500
Subject: [PATCH 9/9] just use or

---
 docs/troubleshooting-faq/faq.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/troubleshooting-faq/faq.md b/docs/troubleshooting-faq/faq.md
index eb6331800..d65360312 100644
--- a/docs/troubleshooting-faq/faq.md
+++ b/docs/troubleshooting-faq/faq.md
@@ -146,7 +146,7 @@ This package has two particular functions to support this task:
 * `rOpenScPCA::sce_to_symbols()`
   * This function converts row names in a `SingleCellExperiment` object from Ensembl ids to gene symbols
 
-Please refer to these functions' help pages (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on their use, including options for handling duplicate and/or missing gene symbols.
+Please refer to these functions' help pages (e.g., `?rOpenScPCA::sce_to_symbols`) for additional information on their use, including options for handling duplicate or missing gene symbols.
 
 ### I noticed there are cluster assignments in the processed data files. Should I use those or re-cluster the data?