AlexsLemonade · sjspielman · Dec 20, 2024 · Dec 20, 2024
@@ -2,8 +2,8 @@
 title: "Performing graph-based clustering with rOpenScPCA"
 date: "`r Sys.Date()`"
 author: "Data Lab"
-output: 
-  html_notebook: 
+output:
+  html_notebook:
     toc: yes
     toc_float: yes
     df_print: paged
@@ -76,7 +76,6 @@ set.seed(2024)
 ## Read in and prepare data
 
 To begin, we'll read in the `SingleCellExperiment` (SCE) object.
-We'll also establish a corresponding processed Seurat object from its raw counts that we'll use for some examples.
 
 ```{r read data}
 # Read the SCE file
@@ -94,7 +93,7 @@ pca_matrix <- reducedDim(sce, "PCA")
 
 ## Perform clustering
 
-This section will show how to perform clustering with the function `rOpenScPCA::calculate_clusters()`. 
+This section will show how to perform clustering with the function `rOpenScPCA::calculate_clusters()`.
 
 This function takes a PCA matrix with rownames representing unique cell ids (e.g., barcodes) as its primary argument.
 By default it will calculate clusters using the following parameters:
@@ -152,7 +151,7 @@ cluster_results_df <- rOpenScPCA::calculate_clusters(
 
 ## Calculate QC metrics on clusters
 
-This section demonstrates how to use several functions for evaluating cluster quality and reliability. 
+This section demonstrates how to use several functions for evaluating cluster quality and reliability.
 It's important to note that a full evaluation of clustering results would compare these metrics across a set of clustering results, with the aim of identifying an optimal parameterization.
 
 All functions presented in this section take the following required arguments:
@@ -236,7 +235,7 @@ ggplot(purity_results) +
 
 ### Cluster stability
 
-Another approach to exploring cluster quality is how stable the clusters themselves are using bootstrapping. 
+Another approach to exploring cluster quality is how stable the clusters themselves are using bootstrapping.
 Given a set of original clusters, we can compare the bootstrapped cluster identities to original ones using the Adjusted Rand Index (ARI), which measures the similarity of two data clusterings.
 ARI ranges from -1 to 1, where:
 
@@ -276,7 +275,7 @@ ggplot(stability_results) +
 
 #### Using non-default clustering parameters
 
-When calculating bootstrap clusters, `rOpenScPCA::calculate_stability()` uses `rOpenScPCA::calculate_clusters()` with default parameters. 
+When calculating bootstrap clusters, `rOpenScPCA::calculate_stability()` uses `rOpenScPCA::calculate_clusters()` with default parameters.
 If your original clusters were not calculated with these defaults, you should pass those customized values into this function as well to ensure a fair comparison between your original clusters and the bootstrap clusters.
 
 
@@ -331,7 +330,6 @@ If you are analyzing your data with a Seurat pipeline that includes calculating
 
 To demonstrate this, we'll convert our SCE object to a Seurat using the function `rOpenScPCA::sce_to_seurat()`.
 Then, we'll use a simple Seurat pipeline to obtain clusters.
-<!-- TODO: We will want to reference this module for further documentation on this function: https://github.com/AlexsLemonade/OpenScPCA-analysis/issues/945 -->
 
 ```{r sce to seurat, message = FALSE}
 # Convert the SCE to a Seurat object using rOpenScPCA
@@ -380,8 +378,8 @@ We do not recommend using `rOpenScPCA::calculate_stability()` on Seurat clusters
 
 ### Evaluating ScPCA clusters
 
-ScPCA cell metadata already contains a column called `cluster` with results from an automated clustering. 
-These clusters were calculated using `bluster`, the same tool that `rOpenScPCA` uses. 
+ScPCA cell metadata already contains a column called `cluster` with results from an automated clustering.
+These clusters were calculated using `bluster`, the same tool that `rOpenScPCA` uses.
 The specifications used for this clustering are stored in the SCE object's metadata, as follows; note that all other clustering parameters were left at their default values.
 
 * `metadata(sce)$cluster_algorithm`: The clustering algorithm used
@@ -446,7 +444,7 @@ scpca_stability_df <- rOpenScPCA::calculate_stability(
 ```
 
 
-## Saving clustering results 
+## Saving clustering results
 
 Results can either be directly exported as a TSV file (e.g., with `readr::write_tsv()`), or you can add the results into your SCE or Seurat object.
 The subsequent examples will demonstrate saving the cluster assignments stored in `cluster_results_df$cluster` to an SCE and a Seurat object.
@@ -456,7 +454,7 @@ Objects from the ScPCA Portal already contain a column called `cluster` with res
 These automatic clusters were not evaluated, and their parameters were not optimized for any given library.
 To avoid ambiguity between the existing and new clustering results, we'll name the new column `ropenscpca_cluster`.
 
-### Saving results to an SCE object 
+### Saving results to an SCE object
 
 We can add columns to an SCE object's `colData` table by directly creating a column in the object with `$`.
 Before we do so, we'll confirm that the clusters are in the same order as the SCE object by comparing cell ids:
@@ -473,7 +471,7 @@ all.equal(
 sce$ropenscpca_cluster <- cluster_results_df$cluster
 ```
 
-### Saving results to a Seurat object 
+### Saving results to a Seurat object
 
 
 We can add columns to an Seurat object's cell metadata table by directly creating a column in the object with `$` (note that you can also use the Seurat function `AddMetaData()`).