Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small hello-clusters notebook cleanups #958

Merged
merged 1 commit into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 11 additions & 13 deletions analyses/hello-clusters/01_perform-evaluate-clustering.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
title: "Performing graph-based clustering with rOpenScPCA"
date: "`r Sys.Date()`"
author: "Data Lab"
output:
html_notebook:
output:
html_notebook:
toc: yes
toc_float: yes
df_print: paged
Expand Down Expand Up @@ -76,7 +76,6 @@ set.seed(2024)
## Read in and prepare data

To begin, we'll read in the `SingleCellExperiment` (SCE) object.
We'll also establish a corresponding processed Seurat object from its raw counts that we'll use for some examples.

```{r read data}
# Read the SCE file
Expand All @@ -94,7 +93,7 @@ pca_matrix <- reducedDim(sce, "PCA")

## Perform clustering

This section will show how to perform clustering with the function `rOpenScPCA::calculate_clusters()`.
This section will show how to perform clustering with the function `rOpenScPCA::calculate_clusters()`.

This function takes a PCA matrix with rownames representing unique cell ids (e.g., barcodes) as its primary argument.
By default it will calculate clusters using the following parameters:
Expand Down Expand Up @@ -152,7 +151,7 @@ cluster_results_df <- rOpenScPCA::calculate_clusters(

## Calculate QC metrics on clusters

This section demonstrates how to use several functions for evaluating cluster quality and reliability.
This section demonstrates how to use several functions for evaluating cluster quality and reliability.
It's important to note that a full evaluation of clustering results would compare these metrics across a set of clustering results, with the aim of identifying an optimal parameterization.

All functions presented in this section take the following required arguments:
Expand Down Expand Up @@ -236,7 +235,7 @@ ggplot(purity_results) +

### Cluster stability

Another approach to exploring cluster quality is how stable the clusters themselves are using bootstrapping.
Another approach to exploring cluster quality is how stable the clusters themselves are using bootstrapping.
Given a set of original clusters, we can compare the bootstrapped cluster identities to original ones using the Adjusted Rand Index (ARI), which measures the similarity of two data clusterings.
ARI ranges from -1 to 1, where:

Expand Down Expand Up @@ -276,7 +275,7 @@ ggplot(stability_results) +

#### Using non-default clustering parameters

When calculating bootstrap clusters, `rOpenScPCA::calculate_stability()` uses `rOpenScPCA::calculate_clusters()` with default parameters.
When calculating bootstrap clusters, `rOpenScPCA::calculate_stability()` uses `rOpenScPCA::calculate_clusters()` with default parameters.
If your original clusters were not calculated with these defaults, you should pass those customized values into this function as well to ensure a fair comparison between your original clusters and the bootstrap clusters.


Expand Down Expand Up @@ -331,7 +330,6 @@ If you are analyzing your data with a Seurat pipeline that includes calculating

To demonstrate this, we'll convert our SCE object to a Seurat using the function `rOpenScPCA::sce_to_seurat()`.
Then, we'll use a simple Seurat pipeline to obtain clusters.
<!-- TODO: We will want to reference this module for further documentation on this function: https://github.com/AlexsLemonade/OpenScPCA-analysis/issues/945 -->

```{r sce to seurat, message = FALSE}
# Convert the SCE to a Seurat object using rOpenScPCA
Expand Down Expand Up @@ -380,8 +378,8 @@ We do not recommend using `rOpenScPCA::calculate_stability()` on Seurat clusters

### Evaluating ScPCA clusters

ScPCA cell metadata already contains a column called `cluster` with results from an automated clustering.
These clusters were calculated using `bluster`, the same tool that `rOpenScPCA` uses.
ScPCA cell metadata already contains a column called `cluster` with results from an automated clustering.
These clusters were calculated using `bluster`, the same tool that `rOpenScPCA` uses.
The specifications used for this clustering are stored in the SCE object's metadata, as follows; note that all other clustering parameters were left at their default values.

* `metadata(sce)$cluster_algorithm`: The clustering algorithm used
Expand Down Expand Up @@ -446,7 +444,7 @@ scpca_stability_df <- rOpenScPCA::calculate_stability(
```


## Saving clustering results
## Saving clustering results

Results can either be directly exported as a TSV file (e.g., with `readr::write_tsv()`), or you can add the results into your SCE or Seurat object.
The subsequent examples will demonstrate saving the cluster assignments stored in `cluster_results_df$cluster` to an SCE and a Seurat object.
Expand All @@ -456,7 +454,7 @@ Objects from the ScPCA Portal already contain a column called `cluster` with res
These automatic clusters were not evaluated, and their parameters were not optimized for any given library.
To avoid ambiguity between the existing and new clustering results, we'll name the new column `ropenscpca_cluster`.

### Saving results to an SCE object
### Saving results to an SCE object

We can add columns to an SCE object's `colData` table by directly creating a column in the object with `$`.
Before we do so, we'll confirm that the clusters are in the same order as the SCE object by comparing cell ids:
Expand All @@ -473,7 +471,7 @@ all.equal(
sce$ropenscpca_cluster <- cluster_results_df$cluster
```

### Saving results to a Seurat object
### Saving results to a Seurat object


We can add columns to an Seurat object's cell metadata table by directly creating a column in the object with `$` (note that you can also use the Seurat function `AddMetaData()`).
Expand Down
55 changes: 30 additions & 25 deletions analyses/hello-clusters/01_perform-evaluate-clustering.nb.html

Large diffs are not rendered by default.

Loading