-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script to run AUCell on all samples using EWS-FLI1 high/low gene signatures #998
Script to run AUCell on all samples using EWS-FLI1 high/low gene signatures #998
Conversation
…cell-script-tumor-cell-states
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me, and I was able to run the script locally. Left some comments but nothing major, let me know if I can clarify!
analyses/cell-type-ewings/README.md
Outdated
## Using `AUCell` to calculate gene set signatures | ||
|
||
The script, `run-aucell-ews-gene-signatures.sh`, is used to run `AUCell` with a set of custom gene signatures on all samples in SCPCP000015. | ||
By default, AUC values for calculated for all gene signatures in [references/gene_signatures](../references/gene_signatures/) and a set of `MSigDB` signatures associated with high and low EWS-FLI1 expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, AUC values for calculated for all gene signatures in [references/gene_signatures](../references/gene_signatures/) and a set of `MSigDB` signatures associated with high and low EWS-FLI1 expression. | |
By default, AUC values for calculated for all gene signatures in [references/gene_signatures](references/gene_signatures/) and a set of `MSigDB` signatures associated with high and low EWS-FLI1 expression. |
analyses/cell-type-ewings/exploratory_analysis/07-identify-max-rank-aucell.Rmd
Show resolved
Hide resolved
aucMaxRank = max_rank, | ||
#BPPARAM = bp_param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be commented out?
# grab msigdb gene sets | ||
ews_gene_sets <- c( | ||
"staege" = "STAEGE_EWING_FAMILY_TUMOR", | ||
"miyagawa_up" = "MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_UP", | ||
"miyagawa_down" = "MIYAGAWA_TARGETS_OF_EWSR1_ETS_FUSIONS_DN", | ||
"zhang"= "ZHANG_TARGETS_OF_EWSR1_FLI1_FUSION", | ||
"riggi_up" = "RIGGI_EWING_SARCOMA_PROGENITOR_UP", | ||
"riggi_down" = "RIGGI_EWING_SARCOMA_PROGENITOR_DN", | ||
"kinsey_up" = "KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP", | ||
"kinsey_down"= "KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_DN" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually wondering now whether we want to make this a stand-alone TSV file to read in here, and we can format it into a named vector in the script since I think that may be more maintainable should anything change here.
I'm also wondering about the Wrenn msigdb sets as seen in references/README.md
.. Do we want to use those here?
- [`HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION.html)
- [`REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION`](https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION.html)
Co-authored-by: Stephanie Spielman <[email protected]>
I updated this to include a TSV file I re-ran the workflow and updated the results on S3 to reflect these changes. This should be ready for another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I caught a few bugs which need to be fixed and the code re-run accordingly, but otherwise I think this is good to go! I don't need to see this again unless you'd like to tag me back.
riggi_down RIGGI_EWING_SARCOMA_PROGENITOR_DN C2 CGP | ||
kinsey_up KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_UP C2 CGP | ||
kinsey_down KINSEY_TARGETS_OF_EWSR1_FLII_FUSION_DN C2 CGP | ||
hallmark_EMT HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually add NA
for the subcategory here instead of leaving empty? It does get read as NA
within R, but seems safer to actually write it out here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll change this but then will still keep the change to use NULL in the script.
Co-authored-by: Stephanie Spielman <[email protected]>
Noting that I made the updates requested, re-ran the script, and then re-uploaded results. I'll merge this in once checks pass! |
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
Closes #985
What is the goal of this pull request?
Here I am adding two new scripts, one to run
AUCell
on a single SCE object using a set of custom gene signatures for defining tumor cell states (mainly EWS-FLI1 high and EWS-FLI1 low) and a second script to run the first script on all samples in the Ewing project. Ultimately, we want to use the results from AUCell to help label cells based on the specific cell state they are in.I am using the two custom gene sets that we have stored in
references/gene_signatures
that are marker gene lists for EWS-FLI1 high and low along with a set of MSigDB gene sets that we have identified from the literature as being potentially useful. It's pretty quick to runAUCell
so I figured it wouldn't hurt to just use all the gene sets in that list.I did not include genes in our two marker gene lists,
visser-all-marker-genes.tsv
andtumor-cell-state-markers.tsv
since the gene lists for each cell type there are quite small and would probably skew the AUC results.Briefly describe the general approach you took to achieve this goal.
aucMaxRank
, which I set to the default of 0.01.AUCell
. I don't know that we will use it, but I think it could be helpful when we go to plot this data.If known, do you anticipate filing additional pull requests to complete this analysis module?
Yes see #993
Results
What is the name of your results bucket on S3?
s3://researcher-211125375652-us-east-2/cell-type-ewings/results/aucell-ews-signatures
What types of results does your code produce (e.g., table, figure)?
TSV files with the AUC values
What is your summary of the results?
Coming next!
Author checklists
Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.