Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to run AUCell on all samples using EWS-FLI1 high/low gene signatures #998

Conversation

allyhawkins
Copy link
Member

@allyhawkins allyhawkins commented Jan 17, 2025

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

Closes #985

What is the goal of this pull request?

Here I am adding two new scripts, one to run AUCell on a single SCE object using a set of custom gene signatures for defining tumor cell states (mainly EWS-FLI1 high and EWS-FLI1 low) and a second script to run the first script on all samples in the Ewing project. Ultimately, we want to use the results from AUCell to help label cells based on the specific cell state they are in.

I am using the two custom gene sets that we have stored in references/gene_signatures that are marker gene lists for EWS-FLI1 high and low along with a set of MSigDB gene sets that we have identified from the literature as being potentially useful. It's pretty quick to run AUCell so I figured it wouldn't hurt to just use all the gene sets in that list.

I did not include genes in our two marker gene lists, visser-all-marker-genes.tsv and tumor-cell-state-markers.tsv since the gene lists for each cell type there are quite small and would probably skew the AUC results.

Briefly describe the general approach you took to achieve this goal.

  • Although we previously had a script for AUCell, it was for a pretty specific use case where we are defining tumor cells. I didn't want to have to make huge edits in that workflow and the code was different enough that I chose to make a new script for this.
  • This script takes in a single SCE object, the custom gene sets, and a percentage to use for determining the aucMaxRank, which I set to the default of 0.01.
  • The output of this script is a TSV file with AUC values for each cell and each gene set. I also chose to include the AUC threshold value that is reported by AUCell. I don't know that we will use it, but I think it could be helpful when we go to plot this data.
  • I then wrote a wrapper script that runs AUCell using these gene signatures on every sample in the project and saves each TSV to the results directory.
  • The last thing I did was update the documentation where necessary. I know this technically isn't a new "workflow", since we're just running one script, but I still followed a similar format in how I documented it and included it in the main README. Once we create our notebooks that go sample by sample, I would like to reorganize some of the things in this module and move things that aren't used anymore to a sub folder so that things are less crowded and easier to understand. But for now, I just added it to existing documentation.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes see #993

Results

What is the name of your results bucket on S3?

s3://researcher-211125375652-us-east-2/cell-type-ewings/results/aucell-ews-signatures

What types of results does your code produce (e.g., table, figure)?

TSV files with the AUC values

What is your summary of the results?

Coming next!

Author checklists

Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

@allyhawkins allyhawkins requested review from sjspielman and removed request for jaclyn-taroni January 17, 2025 22:20
Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me, and I was able to run the script locally. Left some comments but nothing major, let me know if I can clarify!

## Using `AUCell` to calculate gene set signatures

The script, `run-aucell-ews-gene-signatures.sh`, is used to run `AUCell` with a set of custom gene signatures on all samples in SCPCP000015.
By default, AUC values for calculated for all gene signatures in [references/gene_signatures](../references/gene_signatures/) and a set of `MSigDB` signatures associated with high and low EWS-FLI1 expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, AUC values for calculated for all gene signatures in [references/gene_signatures](../references/gene_signatures/) and a set of `MSigDB` signatures associated with high and low EWS-FLI1 expression.
By default, AUC values for calculated for all gene signatures in [references/gene_signatures](references/gene_signatures/) and a set of `MSigDB` signatures associated with high and low EWS-FLI1 expression.

analyses/cell-type-ewings/README.md Outdated Show resolved Hide resolved
analyses/cell-type-ewings/README.md Outdated Show resolved Hide resolved
analyses/cell-type-ewings/README.md Outdated Show resolved Hide resolved
analyses/cell-type-ewings/README.md Outdated Show resolved Hide resolved
analyses/cell-type-ewings/README.md Outdated Show resolved Hide resolved
analyses/cell-type-ewings/run-aucell-ews-signatures.sh Outdated Show resolved Hide resolved
Comment on lines 154 to 155
aucMaxRank = max_rank,
#BPPARAM = bp_param
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be commented out?

Comment on lines 106 to 116
# grab msigdb gene sets
ews_gene_sets <- c(
"staege" = "STAEGE_EWING_FAMILY_TUMOR",
"miyagawa_up" = "MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP",
"miyagawa_down" = "MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN",
"zhang"= "ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION",
"riggi_up" = "RIGGI_EWING_SARCOMA_PROGENITOR_UP",
"riggi_down" = "RIGGI_EWING_SARCOMA_PROGENITOR_DN",
"kinsey_up" = "KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP",
"kinsey_down"= "KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_DN"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually wondering now whether we want to make this a stand-alone TSV file to read in here, and we can format it into a named vector in the script since I think that may be more maintainable should anything change here.

I'm also wondering about the Wrenn msigdb sets as seen in references/README.md.. Do we want to use those here?

- [`HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION.html)
- [`REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION.html)

@allyhawkins
Copy link
Member Author

I'm actually wondering now whether we want to make this a stand-alone TSV file to read in here, and we can format it into a named vector in the script since I think that may be more maintainable should anything change here.

I'm also wondering about the Wrenn msigdb sets as seen in references/README.md.. Do we want to use those here?

- [`HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION.html)
- [`REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION.html)

I updated this to include a TSV file msigdb-gene-sets.tsv that gets read in and used to grab the gene sets from MSigDB. I also added those two extra gene sets. To be honest, I just didn't want to deal with getting gene sets from different collections, but that's silly and we should include them which is easier to do with the TSV file! This involved some code updates to account for the fact that we have multiple collections, but nothing too crazy.

I re-ran the workflow and updated the results on S3 to reflect these changes. This should be ready for another look!

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I caught a few bugs which need to be fixed and the code re-run accordingly, but otherwise I think this is good to go! I don't need to see this again unless you'd like to tag me back.

analyses/cell-type-ewings/run-aucell-ews-signatures.sh Outdated Show resolved Hide resolved
riggi_down RIGGI_EWING_SARCOMA_PROGENITOR_DN C2 CGP
kinsey_up KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP C2 CGP
kinsey_down KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_DN C2 CGP
hallmark_EMT HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION H
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we actually add NA for the subcategory here instead of leaving empty? It does get read as NA within R, but seems safer to actually write it out here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change this but then will still keep the change to use NULL in the script.

analyses/cell-type-ewings/references/msigdb-gene-sets.tsv Outdated Show resolved Hide resolved
analyses/cell-type-ewings/run-aucell-ews-signatures.sh Outdated Show resolved Hide resolved
@allyhawkins
Copy link
Member Author

allyhawkins commented Jan 21, 2025

Noting that I made the updates requested, re-ran the script, and then re-uploaded results. I'll merge this in once checks pass!

@allyhawkins allyhawkins merged commit 51032a5 into AlexsLemonade:main Jan 22, 2025
3 checks passed
@allyhawkins allyhawkins deleted the allyhawkins/aucell-script-tumor-cell-states branch January 22, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Script to run AUCell on all marker gene sets in Ewings module
2 participants