AlexsLemonade · sjspielman · Jan 15, 2025 · Jan 15, 2025 · Jan 15, 2025 · Jan 15, 2025
@@ -15,6 +15,7 @@ ARI
 astrocytes
 AUCell
 authenticator
+Aynaud
 barcode
 barcodes
 Beligs
@@ -32,6 +33,7 @@ Carpentries
 CellAssign
 CELLxGENE
 chemotherapies
+chondrocyte
 chondrocytes
 chr
 CLI
@@ -70,6 +72,7 @@ dropdown
 DSRCT
 ECM
 ECR
+embeddings
 endothelia
 endothelial
 endothelium
@@ -90,6 +93,7 @@ fibroblast
 fibroblasts
 FLI
 formatters
+Franzetti
 Generis
 GFM
 GHA
@@ -162,6 +166,7 @@ monocyte
 monocytes
 mononuclear
 MSC
+MSigDB
 multifactor
 multinucleated
 myeloid
@@ -191,6 +196,7 @@ perivascular
 ploidy
 pluripotent
 programmatically
+proliferative
 PMID
 PNG
 podman
@@ -212,6 +218,7 @@ repo
 reproducibility
 reproducibly
 ribosomal
+Riggi
 RNAseq
 rOpenScPCA
 RStudio
@@ -269,10 +276,12 @@ uteric
 vCPU
 vCPUs
 Visser
+vitro
 VSCodium
 Wilms
 WIPO
 WisCon
+Wrenn
 WSL
 Xcode
 xenograft

@@ -10,6 +10,3 @@ Spellcheck found **{{ERROR_COUNT}} errors**.
 
 - [ ] Assign an OpenScPCA admin
 - [ ] Spell check errors have been fixed via a PR
-- [ ] Spell check workflow has been run to confirm spelling errors are fixed.<br>
-  If the PR branch has a name that starts with `spelling/` and the `dictionary.txt` file is updated, the spell check workflow will run automatically.
-  Otherwise the spell check workflow may need to be run manually.
@@ -50,43 +50,42 @@ The first column contains the cell barcode and the second contains the annotatio
 These files are specific for each library and depend on which cells are denoted as the reference.
 Each library contains a folder with any annotations file used to run `InferCNV` for that library.
 
-## Marker gene sets for identifying tumor cell states 
+## Marker gene sets for identifying tumor cell states
 
-The `tumor-cell-state-markers.tsv` file contains a list of marker genes that can be used to classify tumor cell states in Ewing samples. 
-The marker genes included here are specific to EWS-FLI1 high, EWS-FLI1 low, and proliferative tumor cells. 
-This list was obtained based on key genes mentioned in the following publications: 
+The `tumor-cell-state-markers.tsv` file contains a list of marker genes that can be used to classify tumor cell states in Ewing samples.
+The marker genes included here are specific to EWS-FLI1 high, EWS-FLI1 low, and proliferative tumor cells.
+This list was obtained based on key genes mentioned in the following publications:
 
 - [Goodspeed _et al._](https://doi.org/10.1101/2024.01.18.576251)
 - [Aynaud _et al._](https://doi.org/10.1016/j.celrep.2020.01.049)
 - [Wrenn _et al._](https://doi.org/10.1158/1078-0432.CCR-23-1111)
 - [Franzetti _et al._](https://doi.org/10.1038/onc.2016.498)
 - [Riggi _et al._](https://doi.org/10.1016/j.ccell.2014.10.004)
 
-### Gene signatures 
+### Gene signatures
 
-The `gene_signatures` folder contains any custom gene lists obtained from publications that can be used to identify tumor cell states: 
+The `gene_signatures` folder contains any custom gene lists obtained from publications that can be used to identify tumor cell states:
 
-1. `anyaud-ews-targets.tsv`: A list of the 78 marker genes defined by [Aynaud _et al._](https://doi.org/10.1016/j.celrep.2020.01.049) to be EWS-FLI1 targets. 
-Figure 4 shows that expression of these targets is correlated with EWS-FLI1 levels at a single-cell level. 
-We expect these targets to have increased expression in cells with high EWS-FLI1 activity. 
+1. `anyaud-ews-targets.tsv`: A list of the 78 marker genes defined by [Aynaud _et al._](https://doi.org/10.1016/j.celrep.2020.01.049) to be EWS-FLI1 targets.
+Figure 4 shows that expression of these targets is correlated with EWS-FLI1 levels at a single-cell level.
+We expect these targets to have increased expression in cells with high EWS-FLI1 activity.
 
-2. `wrenn-nt5e-genes.tsv`: A list of 28 genes from [Wrenn _et al._](https://doi.org/10.1158/1078-0432.CCR-23-1111) that represent the overlap between the top 217 genes correlated with _NT5E_ expression in patient tumors and the top 200 markers of _NT5E+_ Ewing sarcoma cells _in vitro_. 
+2. `wrenn-nt5e-genes.tsv`: A list of 28 genes from [Wrenn _et al._](https://doi.org/10.1158/1078-0432.CCR-23-1111) that represent the overlap between the top 217 genes correlated with _NT5E_ expression in patient tumors and the top 200 markers of _NT5E+_ Ewing sarcoma cells _in vitro_.
 These genes are shown in Figure 5D and 5E.
-We expect these targets to have increased expression in cells with low EWS-FLI1 activity. 
+We expect these targets to have increased expression in cells with low EWS-FLI1 activity.
 
-The following gene sets from MSigDB were also used to define EWS-FLI1 targets and may be helpful in defining cell states:  
+The following gene sets from MSigDB were also used to define EWS-FLI1 targets and may be helpful in defining cell states:
 
-- [STAEGE_EWING_FAMILY_TUMOR](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/STAEGE_EWING_FAMILY_TUMOR.html)
-- [MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP.html)
-- [MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN.html)
-- [ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION.html)
-- [RIGGI_EWING_SARCOMA_PROGENITOR_UP](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/RIGGI_EWING_SARCOMA_PROGENITOR_UP.html) 
-- [RIGGI_EWING_SARCOMA_PROGENITOR_DN](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/RIGGI_EWING_SARCOMA_PROGENITOR_DN.html) 
-- [KINSEY_TARGETS_OF_EWSR1_FLI1_FUSION_UP](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP.html)
-- [KINSEY_TARGETS_OF_EWSR1_FLI1_FUSION_DN](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_DN.html)
+- [`STAEGE_EWING_FAMILY_TUMOR`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/STAEGE_EWING_FAMILY_TUMOR.html)
+- [`MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP.html)
+- [`MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN.html)
+- [`ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION.html)
+- [`RIGGI_EWING_SARCOMA_PROGENITOR_UP`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/RIGGI_EWING_SARCOMA_PROGENITOR_UP.html)
+- [`RIGGI_EWING_SARCOMA_PROGENITOR_DN`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/RIGGI_EWING_SARCOMA_PROGENITOR_DN.html)
+- [`KINSEY_TARGETS_OF_EWSR1_FLI1_FUSION_UP`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP.html)
+- [`KINSEY_TARGETS_OF_EWSR1_FLI1_FUSION_DN`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_DN.html)
 
-Wrenn _et al._ also used found that the following additional gene sets were highly expressed in CD73 high, EWS-FLI1 low tumor cells: 
+Wrenn _et al._ also used found that the following additional gene sets were highly expressed in CD73 high, EWS-FLI1 low tumor cells:
 
-- [HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION.html)
-GO:BP ECM Organization 
-- [GOBP_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION.html)
+- [`HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION.html)
+- [`REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION.html)
@@ -12,7 +12,7 @@ output:
 ## Introduction
 
 Clustering algorithms have several parameters which can be varied, leading to different clustering results.
-A key question when clustering, therefore, is how to identify a set of parameters that lead to robust and reliable clusters that can be used in downstream analysis. 
+A key question when clustering, therefore, is how to identify a set of parameters that lead to robust and reliable clusters that can be used in downstream analysis.
 
 This notebook provides examples of how to use the `rOpenScPCA` package to:
 
@@ -93,9 +93,9 @@ pca_matrix <- reducedDim(sce, "PCA")
 
 ## Varying a single clustering parameter
 
-This section will show how to perform clustering across a set of parameters (aka, "sweep" a set of parameters) with `rOpenScPCA::sweep_clusters()`. 
+This section will show how to perform clustering across a set of parameters (aka, "sweep" a set of parameters) with `rOpenScPCA::sweep_clusters()`.
 
-This function takes a PCA matrix with row names representing unique cell ids (e.g., barcodes) as its primary argument, with additional arguments for cluster parameters. 
+This function takes a PCA matrix with row names representing unique cell ids (e.g., barcodes) as its primary argument, with additional arguments for cluster parameters.
 This function wraps the `rOpenScPCA::calculate_clusters()` function but allows you to provide a vector of parameter values to perform clustering across, as listed below.
 Clusters will be calculated for all combinations of parameters values (where applicable); default values that the function will use for any unspecified parameter values are shown in parentheses.
 
@@ -105,10 +105,10 @@ Clusters will be calculated for all combinations of parameters values (where app
 * `resolution`: The resolution parameter (1; used only with Louvain and Leiden clustering)
 * `objective_function`: The objective function to optimize clusters (CPM; used only with Leiden clustering)
 
-`rOpenScPCA::sweep_clusters()` does not allow you to specify values for any other parameters. 
+`rOpenScPCA::sweep_clusters()` does not allow you to specify values for any other parameters.
 
 
-This function will return a list of data frames of clustering results. 
+This function will return a list of data frames of clustering results.
 Each data frame will have the following columns:
 
 * `cell_id`: Unique cell identifiers, obtained from the PCA matrix's row names
@@ -150,7 +150,7 @@ cluster_results_list |>
   purrr::map(head)
 ```
 
-Generally speaking, `purrr::map()` can be used to iterate over this list to visualize or analyze each clustering result on its own; we'll use this approach in the following sections. 
+Generally speaking, `purrr::map()` can be used to iterate over this list to visualize or analyze each clustering result on its own; we'll use this approach in the following sections.
 
 ### Visualizing clustering results
 
@@ -206,7 +206,7 @@ These plots show that the number of clusters decreases as the nearest neighbors
 ### Evaluating clustering results
 
 This section will use `purrr::map()` to iterate over each clustering result data frame to calculate silhouette width, neighborhood purity, and stability, and then visualize results.
-The goal of this code is to identify whether one clustering parameterization produces more reliable clusters. 
+The goal of this code is to identify whether one clustering parameterization produces more reliable clusters.
 
 
 #### Silhouette width and neighborhood purity
@@ -268,12 +268,12 @@ silhouette_plot + purity_plot & theme(legend.position = "none")
 ```
 
 While there does not appear to be a salient difference among silhouette width distributions, it does appear that purity is higher with a higher nearest neighbors parameter.
-It's worth noting that this trend in purity values is expected: Higher nearest neighbor parameter values lead to fewer clusters, and neighborhood purity tends to be higher when there are fewer clusters. 
+It's worth noting that this trend in purity values is expected: Higher nearest neighbor parameter values lead to fewer clusters, and neighborhood purity tends to be higher when there are fewer clusters.
 
 
 #### Stability
 
-Next, we'll calculate stability on the clusters using `rOpenScPCA::calculate_stability()`, specifying the same parameter used for the original cluster calculation at each iteration. 
+Next, we'll calculate stability on the clusters using `rOpenScPCA::calculate_stability()`, specifying the same parameter used for the original cluster calculation at each iteration.
 
 ```{r calculate stability}
 stability_list <- cluster_results_list |>
@@ -304,7 +304,7 @@ ggplot(stability_df) +
   theme(legend.position = "none")
 ```
 
-Here, we see that a nearest neighbors value of 20 or 30 leads to more stable clustering results compared to 10. 
+Here, we see that a nearest neighbors value of 20 or 30 leads to more stable clustering results compared to 10.
 
 
 ## Varying multiple clustering parameters
@@ -378,7 +378,7 @@ patchwork::wrap_plots(umap_plots, ncol = 3)
 
 This section presents one coding strategy to calculate and visualize results when varying two clustering parameters.
 In particular, we use faceting to help display all information in one plot, by placing nearest neighbor values on the X-axis and faceting by resolution values.
-Since silhouette width and neighhorbood purity calculations using generally similar code, we'll just show neighborhood purity here.
+Since silhouette width and neighborhood purity calculations using generally similar code, we'll just show neighborhood purity here.
 
 #### Neighborhood purity