Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spelling #988

Merged
merged 4 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/components/dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ ARI
astrocytes
AUCell
authenticator
Aynaud
barcode
barcodes
Beligs
Expand All @@ -32,6 +33,7 @@ Carpentries
CellAssign
CELLxGENE
chemotherapies
chondrocyte
chondrocytes
chr
CLI
Expand Down Expand Up @@ -70,6 +72,7 @@ dropdown
DSRCT
ECM
ECR
embeddings
endothelia
endothelial
endothelium
Expand All @@ -90,6 +93,7 @@ fibroblast
fibroblasts
FLI
formatters
Franzetti
Generis
GFM
GHA
Expand Down Expand Up @@ -162,6 +166,7 @@ monocyte
monocytes
mononuclear
MSC
MSigDB
multifactor
multinucleated
myeloid
Expand Down Expand Up @@ -191,6 +196,7 @@ perivascular
ploidy
pluripotent
programmatically
proliferative
PMID
PNG
podman
Expand All @@ -212,6 +218,7 @@ repo
reproducibility
reproducibly
ribosomal
Riggi
RNAseq
rOpenScPCA
RStudio
Expand Down Expand Up @@ -269,10 +276,12 @@ uteric
vCPU
vCPUs
Visser
vitro
VSCodium
Wilms
WIPO
WisCon
Wrenn
WSL
Xcode
xenograft
Expand Down
3 changes: 0 additions & 3 deletions .github/cron-issue-templates/spellcheck-issue-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,3 @@ Spellcheck found **{{ERROR_COUNT}} errors**.

- [ ] Assign an OpenScPCA admin
- [ ] Spell check errors have been fixed via a PR
- [ ] Spell check workflow has been run to confirm spelling errors are fixed.<br>
If the PR branch has a name that starts with `spelling/` and the `dictionary.txt` file is updated, the spell check workflow will run automatically.
Otherwise the spell check workflow may need to be run manually.
47 changes: 23 additions & 24 deletions analyses/cell-type-ewings/references/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,43 +50,42 @@ The first column contains the cell barcode and the second contains the annotatio
These files are specific for each library and depend on which cells are denoted as the reference.
Each library contains a folder with any annotations file used to run `InferCNV` for that library.

## Marker gene sets for identifying tumor cell states
## Marker gene sets for identifying tumor cell states

The `tumor-cell-state-markers.tsv` file contains a list of marker genes that can be used to classify tumor cell states in Ewing samples.
The marker genes included here are specific to EWS-FLI1 high, EWS-FLI1 low, and proliferative tumor cells.
This list was obtained based on key genes mentioned in the following publications:
The `tumor-cell-state-markers.tsv` file contains a list of marker genes that can be used to classify tumor cell states in Ewing samples.
The marker genes included here are specific to EWS-FLI1 high, EWS-FLI1 low, and proliferative tumor cells.
This list was obtained based on key genes mentioned in the following publications:

- [Goodspeed _et al._](https://doi.org/10.1101/2024.01.18.576251)
- [Aynaud _et al._](https://doi.org/10.1016/j.celrep.2020.01.049)
- [Wrenn _et al._](https://doi.org/10.1158/1078-0432.CCR-23-1111)
- [Franzetti _et al._](https://doi.org/10.1038/onc.2016.498)
- [Riggi _et al._](https://doi.org/10.1016/j.ccell.2014.10.004)

### Gene signatures
### Gene signatures

The `gene_signatures` folder contains any custom gene lists obtained from publications that can be used to identify tumor cell states:
The `gene_signatures` folder contains any custom gene lists obtained from publications that can be used to identify tumor cell states:

1. `anyaud-ews-targets.tsv`: A list of the 78 marker genes defined by [Aynaud _et al._](https://doi.org/10.1016/j.celrep.2020.01.049) to be EWS-FLI1 targets.
Figure 4 shows that expression of these targets is correlated with EWS-FLI1 levels at a single-cell level.
We expect these targets to have increased expression in cells with high EWS-FLI1 activity.
1. `anyaud-ews-targets.tsv`: A list of the 78 marker genes defined by [Aynaud _et al._](https://doi.org/10.1016/j.celrep.2020.01.049) to be EWS-FLI1 targets.
Figure 4 shows that expression of these targets is correlated with EWS-FLI1 levels at a single-cell level.
We expect these targets to have increased expression in cells with high EWS-FLI1 activity.

2. `wrenn-nt5e-genes.tsv`: A list of 28 genes from [Wrenn _et al._](https://doi.org/10.1158/1078-0432.CCR-23-1111) that represent the overlap between the top 217 genes correlated with _NT5E_ expression in patient tumors and the top 200 markers of _NT5E+_ Ewing sarcoma cells _in vitro_.
2. `wrenn-nt5e-genes.tsv`: A list of 28 genes from [Wrenn _et al._](https://doi.org/10.1158/1078-0432.CCR-23-1111) that represent the overlap between the top 217 genes correlated with _NT5E_ expression in patient tumors and the top 200 markers of _NT5E+_ Ewing sarcoma cells _in vitro_.
These genes are shown in Figure 5D and 5E.
We expect these targets to have increased expression in cells with low EWS-FLI1 activity.
We expect these targets to have increased expression in cells with low EWS-FLI1 activity.

The following gene sets from MSigDB were also used to define EWS-FLI1 targets and may be helpful in defining cell states:
The following gene sets from MSigDB were also used to define EWS-FLI1 targets and may be helpful in defining cell states:

- [STAEGE_EWING_FAMILY_TUMOR](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/STAEGE_EWING_FAMILY_TUMOR.html)
- [MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP.html)
- [MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN.html)
- [ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION.html)
- [RIGGI_EWING_SARCOMA_PROGENITOR_UP](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/RIGGI_EWING_SARCOMA_PROGENITOR_UP.html)
- [RIGGI_EWING_SARCOMA_PROGENITOR_DN](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/RIGGI_EWING_SARCOMA_PROGENITOR_DN.html)
- [KINSEY_TARGETS_OF_EWSR1_FLI1_FUSION_UP](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP.html)
- [KINSEY_TARGETS_OF_EWSR1_FLI1_FUSION_DN](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_DN.html)
- [`STAEGE_EWING_FAMILY_TUMOR`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/STAEGE_EWING_FAMILY_TUMOR.html)
- [`MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP.html)
- [`MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN.html)
- [`ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION.html)
- [`RIGGI_EWING_SARCOMA_PROGENITOR_UP`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/RIGGI_EWING_SARCOMA_PROGENITOR_UP.html)
- [`RIGGI_EWING_SARCOMA_PROGENITOR_DN`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/RIGGI_EWING_SARCOMA_PROGENITOR_DN.html)
- [`KINSEY_TARGETS_OF_EWSR1_FLI1_FUSION_UP`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP.html)
- [`KINSEY_TARGETS_OF_EWSR1_FLI1_FUSION_DN`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_DN.html)

Wrenn _et al._ also used found that the following additional gene sets were highly expressed in CD73 high, EWS-FLI1 low tumor cells:
Wrenn _et al._ also used found that the following additional gene sets were highly expressed in CD73 high, EWS-FLI1 low tumor cells:

- [HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION.html)
GO:BP ECM Organization
- [GOBP_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION.html)
- [`HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION.html)
- [`REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION.html)
22 changes: 11 additions & 11 deletions analyses/hello-clusters/02_compare-clustering-parameters.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ output:
## Introduction

Clustering algorithms have several parameters which can be varied, leading to different clustering results.
A key question when clustering, therefore, is how to identify a set of parameters that lead to robust and reliable clusters that can be used in downstream analysis.
A key question when clustering, therefore, is how to identify a set of parameters that lead to robust and reliable clusters that can be used in downstream analysis.

This notebook provides examples of how to use the `rOpenScPCA` package to:

Expand Down Expand Up @@ -93,9 +93,9 @@ pca_matrix <- reducedDim(sce, "PCA")

## Varying a single clustering parameter

This section will show how to perform clustering across a set of parameters (aka, "sweep" a set of parameters) with `rOpenScPCA::sweep_clusters()`.
This section will show how to perform clustering across a set of parameters (aka, "sweep" a set of parameters) with `rOpenScPCA::sweep_clusters()`.

This function takes a PCA matrix with row names representing unique cell ids (e.g., barcodes) as its primary argument, with additional arguments for cluster parameters.
This function takes a PCA matrix with row names representing unique cell ids (e.g., barcodes) as its primary argument, with additional arguments for cluster parameters.
This function wraps the `rOpenScPCA::calculate_clusters()` function but allows you to provide a vector of parameter values to perform clustering across, as listed below.
Clusters will be calculated for all combinations of parameters values (where applicable); default values that the function will use for any unspecified parameter values are shown in parentheses.

Expand All @@ -105,10 +105,10 @@ Clusters will be calculated for all combinations of parameters values (where app
* `resolution`: The resolution parameter (1; used only with Louvain and Leiden clustering)
* `objective_function`: The objective function to optimize clusters (CPM; used only with Leiden clustering)

`rOpenScPCA::sweep_clusters()` does not allow you to specify values for any other parameters.
`rOpenScPCA::sweep_clusters()` does not allow you to specify values for any other parameters.


This function will return a list of data frames of clustering results.
This function will return a list of data frames of clustering results.
Each data frame will have the following columns:

* `cell_id`: Unique cell identifiers, obtained from the PCA matrix's row names
Expand Down Expand Up @@ -150,7 +150,7 @@ cluster_results_list |>
purrr::map(head)
```

Generally speaking, `purrr::map()` can be used to iterate over this list to visualize or analyze each clustering result on its own; we'll use this approach in the following sections.
Generally speaking, `purrr::map()` can be used to iterate over this list to visualize or analyze each clustering result on its own; we'll use this approach in the following sections.

### Visualizing clustering results

Expand Down Expand Up @@ -206,7 +206,7 @@ These plots show that the number of clusters decreases as the nearest neighbors
### Evaluating clustering results

This section will use `purrr::map()` to iterate over each clustering result data frame to calculate silhouette width, neighborhood purity, and stability, and then visualize results.
The goal of this code is to identify whether one clustering parameterization produces more reliable clusters.
The goal of this code is to identify whether one clustering parameterization produces more reliable clusters.


#### Silhouette width and neighborhood purity
Expand Down Expand Up @@ -268,12 +268,12 @@ silhouette_plot + purity_plot & theme(legend.position = "none")
```

While there does not appear to be a salient difference among silhouette width distributions, it does appear that purity is higher with a higher nearest neighbors parameter.
It's worth noting that this trend in purity values is expected: Higher nearest neighbor parameter values lead to fewer clusters, and neighborhood purity tends to be higher when there are fewer clusters.
It's worth noting that this trend in purity values is expected: Higher nearest neighbor parameter values lead to fewer clusters, and neighborhood purity tends to be higher when there are fewer clusters.


#### Stability

Next, we'll calculate stability on the clusters using `rOpenScPCA::calculate_stability()`, specifying the same parameter used for the original cluster calculation at each iteration.
Next, we'll calculate stability on the clusters using `rOpenScPCA::calculate_stability()`, specifying the same parameter used for the original cluster calculation at each iteration.

```{r calculate stability}
stability_list <- cluster_results_list |>
Expand Down Expand Up @@ -304,7 +304,7 @@ ggplot(stability_df) +
theme(legend.position = "none")
```

Here, we see that a nearest neighbors value of 20 or 30 leads to more stable clustering results compared to 10.
Here, we see that a nearest neighbors value of 20 or 30 leads to more stable clustering results compared to 10.


## Varying multiple clustering parameters
Expand Down Expand Up @@ -378,7 +378,7 @@ patchwork::wrap_plots(umap_plots, ncol = 3)

This section presents one coding strategy to calculate and visualize results when varying two clustering parameters.
In particular, we use faceting to help display all information in one plot, by placing nearest neighbor values on the X-axis and faceting by resolution values.
Since silhouette width and neighhorbood purity calculations using generally similar code, we'll just show neighborhood purity here.
Since silhouette width and neighborhood purity calculations using generally similar code, we'll just show neighborhood purity here.

#### Neighborhood purity

Expand Down
Loading
Loading