diff --git a/analyses/hello-clusters/01_perform-evaluate-clustering.Rmd b/analyses/hello-clusters/01_perform-evaluate-clustering.Rmd index d7f2fbb55..80268a302 100644 --- a/analyses/hello-clusters/01_perform-evaluate-clustering.Rmd +++ b/analyses/hello-clusters/01_perform-evaluate-clustering.Rmd @@ -2,8 +2,8 @@ title: "Performing graph-based clustering with rOpenScPCA" date: "`r Sys.Date()`" author: "Data Lab" -output: - html_notebook: +output: + html_notebook: toc: yes toc_float: yes df_print: paged @@ -76,7 +76,6 @@ set.seed(2024) ## Read in and prepare data To begin, we'll read in the `SingleCellExperiment` (SCE) object. -We'll also establish a corresponding processed Seurat object from its raw counts that we'll use for some examples. ```{r read data} # Read the SCE file @@ -94,7 +93,7 @@ pca_matrix <- reducedDim(sce, "PCA") ## Perform clustering -This section will show how to perform clustering with the function `rOpenScPCA::calculate_clusters()`. +This section will show how to perform clustering with the function `rOpenScPCA::calculate_clusters()`. This function takes a PCA matrix with rownames representing unique cell ids (e.g., barcodes) as its primary argument. By default it will calculate clusters using the following parameters: @@ -152,7 +151,7 @@ cluster_results_df <- rOpenScPCA::calculate_clusters( ## Calculate QC metrics on clusters -This section demonstrates how to use several functions for evaluating cluster quality and reliability. +This section demonstrates how to use several functions for evaluating cluster quality and reliability. It's important to note that a full evaluation of clustering results would compare these metrics across a set of clustering results, with the aim of identifying an optimal parameterization. All functions presented in this section take the following required arguments: @@ -236,7 +235,7 @@ ggplot(purity_results) + ### Cluster stability -Another approach to exploring cluster quality is how stable the clusters themselves are using bootstrapping. +Another approach to exploring cluster quality is how stable the clusters themselves are using bootstrapping. Given a set of original clusters, we can compare the bootstrapped cluster identities to original ones using the Adjusted Rand Index (ARI), which measures the similarity of two data clusterings. ARI ranges from -1 to 1, where: @@ -276,7 +275,7 @@ ggplot(stability_results) + #### Using non-default clustering parameters -When calculating bootstrap clusters, `rOpenScPCA::calculate_stability()` uses `rOpenScPCA::calculate_clusters()` with default parameters. +When calculating bootstrap clusters, `rOpenScPCA::calculate_stability()` uses `rOpenScPCA::calculate_clusters()` with default parameters. If your original clusters were not calculated with these defaults, you should pass those customized values into this function as well to ensure a fair comparison between your original clusters and the bootstrap clusters. @@ -331,7 +330,6 @@ If you are analyzing your data with a Seurat pipeline that includes calculating To demonstrate this, we'll convert our SCE object to a Seurat using the function `rOpenScPCA::sce_to_seurat()`. Then, we'll use a simple Seurat pipeline to obtain clusters. - ```{r sce to seurat, message = FALSE} # Convert the SCE to a Seurat object using rOpenScPCA @@ -380,8 +378,8 @@ We do not recommend using `rOpenScPCA::calculate_stability()` on Seurat clusters ### Evaluating ScPCA clusters -ScPCA cell metadata already contains a column called `cluster` with results from an automated clustering. -These clusters were calculated using `bluster`, the same tool that `rOpenScPCA` uses. +ScPCA cell metadata already contains a column called `cluster` with results from an automated clustering. +These clusters were calculated using `bluster`, the same tool that `rOpenScPCA` uses. The specifications used for this clustering are stored in the SCE object's metadata, as follows; note that all other clustering parameters were left at their default values. * `metadata(sce)$cluster_algorithm`: The clustering algorithm used @@ -446,7 +444,7 @@ scpca_stability_df <- rOpenScPCA::calculate_stability( ``` -## Saving clustering results +## Saving clustering results Results can either be directly exported as a TSV file (e.g., with `readr::write_tsv()`), or you can add the results into your SCE or Seurat object. The subsequent examples will demonstrate saving the cluster assignments stored in `cluster_results_df$cluster` to an SCE and a Seurat object. @@ -456,7 +454,7 @@ Objects from the ScPCA Portal already contain a column called `cluster` with res These automatic clusters were not evaluated, and their parameters were not optimized for any given library. To avoid ambiguity between the existing and new clustering results, we'll name the new column `ropenscpca_cluster`. -### Saving results to an SCE object +### Saving results to an SCE object We can add columns to an SCE object's `colData` table by directly creating a column in the object with `$`. Before we do so, we'll confirm that the clusters are in the same order as the SCE object by comparing cell ids: @@ -473,7 +471,7 @@ all.equal( sce$ropenscpca_cluster <- cluster_results_df$cluster ``` -### Saving results to a Seurat object +### Saving results to a Seurat object We can add columns to an Seurat object's cell metadata table by directly creating a column in the object with `$` (note that you can also use the Seurat function `AddMetaData()`). diff --git a/analyses/hello-clusters/01_perform-evaluate-clustering.nb.html b/analyses/hello-clusters/01_perform-evaluate-clustering.nb.html index 702a46da2..bc5d1fd95 100644 --- a/analyses/hello-clusters/01_perform-evaluate-clustering.nb.html +++ b/analyses/hello-clusters/01_perform-evaluate-clustering.nb.html @@ -11,7 +11,7 @@ - + Performing graph-based clustering with rOpenScPCA @@ -2901,7 +2901,7 @@

Performing graph-based clustering with rOpenScPCA

Data Lab

-

2024-12-17

+

2024-12-20

@@ -3005,8 +3005,7 @@

Set the random seed

Read in and prepare data

To begin, we’ll read in the SingleCellExperiment (SCE) -object. We’ll also establish a corresponding processed Seurat object -from its raw counts that we’ll use for some examples.

+object.

@@ -3062,7 +3061,7 @@

Clustering with default parameters

@@ -3175,7 +3174,7 @@

Silhouette width

@@ -3191,7 +3190,7 @@

Silhouette width

labs(x = "Cluster", y = "Silhouette width") -

+

@@ -3228,7 +3227,7 @@

Neighborhood purity

@@ -3244,7 +3243,7 @@

Neighborhood purity

labs(x = "Cluster", y = "Neighborhood purity") -

+

@@ -3286,7 +3285,7 @@

Cluster stability

@@ -3302,7 +3301,7 @@

Cluster stability

labs(x = "Adjusted rand index across bootstrap replicates") -

+

@@ -3360,7 +3359,7 @@

Working with objects directly

@@ -3385,8 +3384,7 @@

Evaluating Seurat clusters

them.

To demonstrate this, we’ll convert our SCE object to a Seurat using the function rOpenScPCA::sce_to_seurat(). Then, we’ll use a -simple Seurat pipeline to obtain clusters. -

+simple Seurat pipeline to obtain clusters.

@@ -3406,24 +3404,31 @@

Evaluating Seurat clusters

FindNeighbors() |> FindClusters() - + +
Warning in irlba(A = t(x = object), nv = npcs, ...): You're computing too large
+a percentage of total singular values, use a standard svd instead.
+ + +
Warning: Number of dimensions changing from 10 to 50
+ +
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
 
-Number of nodes: 2623
-Number of edges: 78853
+Number of nodes: 100
+Number of edges: 4142
 
 Running Louvain algorithm...
-Maximum modularity in 10 random starts: 0.8478
-Number of communities: 13
+Maximum modularity in 10 random starts: 0.2147
+Number of communities: 2
 Elapsed time: 0 seconds
seurat_obj
- +
An object of class Seurat 
-145743 features across 2623 samples within 3 assays 
-Active assay: SCT (25105 features, 3000 variable features)
+126242 features across 100 samples within 3 assays 
+Active assay: SCT (5604 features, 3000 variable features)
  3 layers present: counts, data, scale.data
  2 other assays present: RNA, spliced
  2 dimensional reductions calculated: pca, umap
@@ -3446,7 +3451,7 @@

Evaluating Seurat clusters

@@ -3538,7 +3543,7 @@

Evaluating ScPCA clusters

@@ -3771,7 +3776,7 @@

Session Info

-

+
